Skip to content

CheckMyTap: Building a 1,000+ Page Water Quality System From Zero

Case Study

Every page earns its existence because the data is real.

CheckMyTap normalizes public water data from four federal sources into city-level reports across 1,000+ US cities. This is the full story: how it was designed, what went wrong, and what I learned building it.

Public water quality data in the US is fragmented across four federal and municipal systems, none of them designed for the person who actually needs the information. The EPA publishes compliance data in one place. EWG aggregates health-based guidelines in another. USGS tracks geological sources separately. And individual cities publish their own reports as PDFs with no standard format. A homeowner who just wants to know whether their water is hard, whether they need a filter or a softener, and how much it will cost has to piece together four different databases to get an answer.

CheckMyTap normalizes all four sources into a single structured format across 1,000+ US cities. Every city page surfaces hardness, PFAS levels, lead readings, water source, a letter grade, and treatment recommendations sized to that city’s specific data. The system generates each page from real data, which means every page contains information that is genuinely distinct from every other page on the site.

checkmytap.com — live site
1,000+City Pages
7Content Layers
5Languages
62+Guides & Articles
4Data Sources
Why water quality specifically?

Water quality sits at the intersection of public data availability and high personal stakes. The data exists. It is just scattered across systems that were designed for regulators, not homeowners. That gap between available data and actionable understanding is exactly where a well-structured site creates genuine information gain. The data is real, the questions are urgent, and nobody had assembled the answer in a format that respects both.

How the System Shows Up

That opportunity required a system to execute, not just a collection of pages. Every structural decision on CheckMyTap maps back to the same framework I use across all my work. This project is probably the clearest demonstration of how Get Found, Get Understood, Get Chosen operates as an actual build methodology rather than a positioning statement.

Seven Content Layers, Each Serving a Different Intent

Here is how that framework shaped what was actually built. The site is not just city pages. It is a layered system where each content type serves a different search intent and connects to the others through internal links. Someone can enter at any layer and navigate to the one that matches where they are in their decision.

Layer 01

City Pages

1,000+ pages at /water-quality/[state]/[city]/. Each surfaces hardness PPM, PFAS levels, lead readings, chlorine data, water source, utility name, and a letter grade. The answer is in the first 100 words. Below that: contaminant tables, treatment sizing, cost estimates, and city-specific softener recommendations. Some cities get additional personality. Portland’s page references the Bull Run Watershed and the $2.56B filtration project. Peoria’s reflects local context. These are not template filler. They are editorial decisions that show a human built this.

Layer 02

Problem Pages

Four core problems: /problems/hard-water/, /problems/pfas/, /problems/lead/, /problems/chlorine/. These explain what each contaminant is, why it matters, and what the symptoms look like. They serve informational intent: someone who knows something is wrong but does not know what.

Layer 03

Solution Pages

Eight treatment types: salt-based softeners, salt-free conditioners, whole-house filters, reverse osmosis, pitcher filters, under-sink filters, gravity filters, and shower filters. Each explains how the technology works, what it removes, what it does not, and who it is for.

Layer 04

Guides and Learn

62+ articles covering specific situations: hard water and hair, water quality when moving, baby formula and water, cooking and water quality, renters, water testing at home. Plus decision guides like softener vs filter, salt-based vs salt-free, sizing calculators. These serve the long tail of commercial intent queries.

Layer 05

Interactive Tools

A water quiz (/quiz/) that recommends a system type based on your situation. A product comparison tool (/compare/) with side-by-side specs. A system finder (/find/). These are decision-support tools that create engagement signals no static page can match.

Layer 06

City-Specific Treatment Guides

Pages like /water-quality/california/los-angeles/best-softener/ that size a system to the specific city’s water data. Not generic “best softener” content. Actual sizing calculations using that city’s PPM, with cost breakdowns and ROI timelines. This is where the data layer and the commercial layer intersect.

Layer 07

News and Investigations

Original reporting and data journalism: which US cities have the worst tap water, the Portland Bull Run filtration saga, microplastics research, hard water maps by state. This layer builds topical authority, earns links, and connects the data system to real events happening in the water quality space.

Layer 08

Water Sources and Watersheds

Pages at /water-sources/ and /contaminants/ that connect geographic water sources to quality outcomes. Water quality is not random. It is determined by geology, infrastructure, and treatment. Explaining that causal chain gives users and search systems a reason to trust the data.

Why the layers are the system

The city pages are the data backbone. But data without context is just a spreadsheet. The problem pages explain what the data means. The solution pages explain what to do about it. The guides address specific life situations. The tools help users make decisions. The news connects the data to real events. Each layer makes the others more useful, and together they form a complete information system that serves the full range of questions someone has about their water. That completeness is what builds authority with both users and search systems. It is also what makes the site genuinely useful rather than just large. For a deeper look at how layered content systems work, the Content Strategy pillar covers the principles behind this approach.

The Data Problem

Those layers only work if the underlying data is trustworthy. The core challenge was not building templates. It was sourcing, cleaning, and validating public water quality data from four different federal and municipal systems, each with different formats, update frequencies, and coverage gaps.

Data Sources
  • EPA SDWIS: Federal compliance data. What was tested and whether it passed legal limits
  • EWG: Health-based guidelines often stricter than legal limits. Important distinction for users
  • USGS: Geological data on water sources. Explains why certain cities have hard water vs soft
  • Municipal CCRs: City-published Consumer Confidence Reports. Most granular data but published as PDFs with no standard format
Data Integrity Decisions
  • Legal vs health limits: I show both. EPA legal limits and health-optimal guidelines. Users deserve to know the difference
  • Incomplete cities: Some cities have partial data. I show what exists and flag what is missing rather than filling gaps with estimates
  • Grading system: A through F grades per city based on contaminants relative to health guidelines, not just legal compliance. Transparent methodology on /methodology/
  • Update cadence: Data is timestamped. Users see when it was last updated. No pretending old data is current
The grading system is an editorial decision

Giving cities letter grades is not a neutral act. It requires defining what “good” means, which thresholds matter, and how to weigh competing factors. I chose health-based guidelines over legal compliance because legal limits have not been updated in decades. The methodology is published. The data sources are cited. That transparency is what separates a grading system from an opinion. It is also what makes the grades citable by AI systems and journalists.

Design Decisions That Serve Users First

Good data in a bad experience still fails. The site’s navigation, content ordering, and product recommendation approach were all designed around one principle: help people understand their water before presenting any solutions.

Navigation Priorities

The menu is ordered deliberately: Check Your Water first, then Learn, then Solutions. Problems before solutions. Education before products. Many sites that include product recommendations lead with the products because that is where revenue comes from. CheckMyTap leads with the data because that is what users came for, and it matches how people actually make decisions about something as personal as their drinking water.

The homepage leads with “Know Your Water. Then Decide.” Not “Buy Our Top Picks.” The first interactive element is a city lookup, not a product.

Product Recommendations

The site includes product recommendations with affiliate links. That is worth being direct about because it creates a real tension: affiliate content can undermine trust if it pushes products people do not need. The way I handle this is by tying every recommendation to the user’s actual water data. A city with 270 PPM hardness gets sized for a specific system. A city with 15 PPM gets told they probably do not need one. The data guides the decision, not the other way around.

Disclosure is upfront on every page that includes product links. A standalone guide on softener vs filter exists as a decision framework before any product page. The goal is that someone can use the entire site, make a fully informed decision, and never feel like they were steered toward a purchase. The affiliate links are there if the recommendation fits. They are absent when it does not.

Language Accessibility

Five languages beyond English: Spanish, Vietnamese, Chinese, Korean, and Tagalog. The architecture is localized with proper URL patterns (/es/calidad-del-agua/, /vi/chat-luong-nuoc/) and hreflang implementation so each version is independently crawlable and indexable.

Language selection was based on US Census data for non-English-speaking households most likely to need water quality information and least likely to find it in their language.

Every Page Contains Distinct Data

Each city page is generated from that city’s actual water quality data. Phoenix shows different hardness, different PFAS readings, different lead numbers, different water source, different treatment sizing, and different cost estimates than Scottsdale. Two cities served by the same utility still get distinct pages because water quality varies by distribution point.

The template provides structure. The data provides uniqueness. That combination is what allows the system to generate 1,000+ pages where each one contains information you will not find on any other page on the site or anywhere else on the internet.

Technical Decisions

Everything described above only reaches users if the technical foundation supports it. The infrastructure had to handle 1,000+ pages without creating crawl waste, indexation confusion, or rendering dependencies.

Crawl

Hierarchical Internal Links

Hub pages link to state pages. State pages link to city pages. Every page is reachable within 3 clicks of the root. No page depends solely on the sitemap for discovery. Sitemaps are segmented by state.

Render

Server-Side HTML

Every page is fully rendered server-side. No client-side data fetching, no JavaScript-dependent content. Googlebot and AI crawlers see exactly what users see on first request.

Canonical

Template-Enforced Consistency

Self-referencing canonicals on every page. No trailing slash variations. No parameter-based duplicates. When one canonical rule applies to 1,000 pages, template-level enforcement is the only approach that scales.

Schema

Structured Data per Template

Schema markup on every city page declares what the content is and what location it covers. Consistent heading hierarchy means an AI system can pull the PFAS section without reading the entire page.

Template governance is what makes scale reliable

A single template change propagates to 1,000 pages simultaneously. That is the mechanism that makes the system maintainable. It is the same lever enterprise SEO teams use to manage sites with thousands of product or location pages. The governance challenge is identical whether you control the full stack or are coordinating across three internal teams. The principle is the same: rules enforced at the template level stay consistent. Rules enforced page by page drift. The Technical SEO pillar goes deeper on how crawl, render, and canonical systems interact at this kind of scale.

Built for AI Extraction

Those technical choices were not just about traditional search. When someone asks ChatGPT “is Phoenix water safe to drink,” the answer needs to come from somewhere. CheckMyTap is structured to be that source. This is a deliberate modern search design decision.

Answer-First Structure

Every city page leads with the key data: hardness level, PFAS status, lead readings, overall safety grade. The answer is in the first 100 words. An AI system extracting “Phoenix water quality” gets a clear, specific, citable answer immediately without parsing 2,000 words of background.

Information Gain Through Real Data

The water quality numbers are information that does not exist in this structured format anywhere else. EWG has contaminant data but not hardness or treatment sizing. Municipal CCRs have detail but only for their own city and only in PDF. Nobody else normalizes all four sources into a single, consistent, comparable format across 1,000 cities. That is genuine information gain, and it is the kind of signal that matters increasingly as AI systems decide which sources to cite. The Modern Search pillar explores how these retrieval systems evaluate and select content.

What I Would Do Differently

That is the system as it exists today. But building it was not a straight line. With the full picture in view, here is an honest look at which decisions I would make again and which ones I would approach differently.

Would Keep
  • Answer-first structure on every page. The single most important design decision
  • Problems before solutions in navigation. Builds trust that compounds
  • Grading system with published methodology. Makes data actionable and citable
  • City-specific treatment sizing. Separates the site from generic content
  • News section with original reporting. Creates authority that templates cannot
Would Change
  • Build the data validation pipeline before writing a single page, not after launch
  • Set up proper version control and deployment workflow on day one
  • Test hreflang implementation against a staging environment before going live
  • Establish an image pipeline with consistent formats and sizing from the start
  • Build a persistent project context system for AI-assisted development
  • Start with 100 cities, validate everything, then scale to 1,000

What Went Wrong (A Lot)

Now that you have seen the architecture, the data decisions, the UX strategy, and the technical foundation, here is the part that portfolios usually leave out. Every system described above was built through a process that included pushing an old repo live, shipping broken tracking, and managing a file system that looked like a crime scene.

There is so much I still need to learn that it hurts. But these reps are what is building my knowledge base, and everyone who builds real things goes through this. The difference is whether you document it and fix it or pretend it did not happen.

Pushed an Old Repo Live

I accidentally deployed a stale version of the site because my GitHub backup workflow was a mess at the start of the project. No deployment checklist, no version verification. The live site reverted to an earlier build and I did not catch it immediately. This is the kind of mistake that happens once and never happens again because it burns enough to remember.

Hreflang Was Broken

The five language versions launched with significant hreflang errors. Tags pointing to wrong URLs, missing self-referencing declarations, inconsistent locale codes. This meant search engines could not properly associate the Spanish, Vietnamese, Chinese, Korean, and Tagalog versions with their English counterparts. Crawl budget was wasted. The fix was straightforward but the damage to early indexation was real.

Tracking Was Broken

Product recommendation tracking was not correctly implemented across the site for weeks. The pages were working. Users were clicking through. But none of it was being measured or attributed because I did not verify the implementation end to end before launch. A five-minute QA check would have caught it.

V1 Data Had Flaws

The first version of the water quality data was not cross-checked thoroughly. Some cities had hardness readings from one source that conflicted with another, and I did not have a reconciliation process in place. Accuracy is the entire value proposition of the site. Publishing data without a validation pipeline is the one mistake that could undermine everything. I built the checks after launch instead of before it.

Creating Faster Than Organizing

I was building pages, writing guides, configuring templates, and generating data faster than I could organize any of it. Files everywhere. No consistent naming. No folder structure that made sense. I have cleaned this up recently, but for months the project backend was chaos. The site looked structured to users while the build environment was anything but.

AI Workflow Without Memory

A huge amount of time was burned prompting AI tools back and forth without persistent context. Re-explaining the same project parameters, re-establishing the same constraints, losing progress between sessions. My system for working at scale with AI assistance was not good enough. I was solving the same problems repeatedly because neither I nor the tools had a reliable way to carry forward what we had already figured out.

Images Are Still a Mess

As of writing, the image situation on CheckMyTap is not where it needs to be. Inconsistent formats, missing alt text on some pages, sizing that is not optimized for the templates. This is an active problem, not a resolved one. I am listing it here because honesty about current gaps matters more than pretending everything is polished.

GitHub Backup Was Fragile

The early project had no reliable version control workflow. Commits were inconsistent. Branching strategy was nonexistent. The repo was more of a dump than a source of truth. This is what led to the stale deployment. I have since rebuilt the backup process, but starting a 1,000-page project without proper version control was a mistake born from moving too fast.

Why this section exists

If I only showed the architecture diagrams and the content layer strategy, this page would be dishonest. The systems thinking is real. The execution gaps were also real. The learning happened in the space between the two. Every problem listed above became a process improvement that now applies to every project I touch, including the decision log I maintain on this site.

I am improving how I do things daily. That is the actual skill being demonstrated here: not that I built a perfect site, but that I built a real one, broke real things, and developed real systems to prevent those breaks from recurring.

Results (coming soon)

Performance metrics, indexation data, and visibility trends will be added here as they are compiled. The structure for this section exists. The data is being gathered.

Get Found. Get Understood. Get Chosen.

CheckMyTap demonstrates every stage of the framework operating on a live site. The architecture, the content layers, the technical foundation, and yes, the failures and fixes. This is what building SEO systems actually looks like.

See My SEO Systems Approach