Building a Waterfall Enrichment Workflow in Clay
A waterfall enrichment workflow is not a Clay-specific concept — it is a cascading fallback chain where each enrichment provider only fires when the previous one returns null or low-confidence data. The goal is coverage at minimum cost: hit the cheapest provider first, fall back to more expensive ones only when necessary. Clay is the best tool for building this without code because its conditional column logic maps directly to waterfall semantics.
This playbook walks through a five-layer waterfall: Apollo (base enrichment), ZoomInfo (enterprise data depth), People Data Labs (alternative sourcing), Clay web search + Claygent (unstructured data), and Claude inference (gap-filling and normalization). Total credit cost per fully enriched row runs $0.08–$0.22 depending on how many layers fire. Compare that to a single ZoomInfo bulk export at $0.15–$0.40 per row with no fallback logic.
Setup: What You Need Before You Build
- Clay workspace: Pro plan or above ($149/month, 25,000 credits). The waterfall logic requires conditional columns, which are available on Pro.
- Provider integrations connected: Apollo (native Clay integration), ZoomInfo (native), People Data Labs (native). Each requires its own API key or OAuth connection inside Clay’s integrations panel.
- Claygent enabled: Included on Pro plan. No separate setup unless you are using a custom Claude API key.
- Input list: At minimum, first name + last name + company name. Email or LinkedIn URL significantly improves match rates at every layer. LinkedIn URL is the highest-confidence identifier across all providers.
Name your table clearly: enrichment_waterfall_[date]. Clay does not version tables, so naming discipline prevents the “which one is current” problem when you duplicate tables for testing.
Step-by-Step: Building the Column Chain
Layer 1: Apollo person match. Add a Clay enrichment column using the Apollo integration. Select “Find Person” action. Map input fields: first name, last name, organization name, LinkedIn URL (if available). Apollo returns: verified email, mobile phone (on paid tiers), job title, seniority, company domain, headcount range, industry, and technologies used.
Apollo cost: 1 credit per person match ($0.006 on Pro plan). Apollo match rate on a clean B2B list with full name + company is typically 60–75%. It drops to 40–55% on lists without LinkedIn URLs.
Layer 2: ZoomInfo person enrichment (conditional on Apollo miss). Add a second enrichment column. Set the run condition to fire only when the Apollo email column is empty or returns an error. Use ZoomInfo’s “Enrich Person” action. ZoomInfo’s match rate on Apollo misses is 20–30% incremental — it covers enterprise accounts and government/edu domains that Apollo underpopulates.
ZoomInfo cost: varies by contract. On Clay’s reseller pricing, expect 5–10 credits per ZoomInfo person enrichment ($0.03–$0.06). This is why you condition it: running ZoomInfo on every row regardless of Apollo coverage would cost 5x more per row.
Layer 3: People Data Labs (conditional on ZoomInfo miss). PDL covers a different slice of the contact universe: early-stage company employees, independent contractors, and technical roles that Apollo and ZoomInfo underpopulate. Add a PDL “Person Enrichment” column conditioned on ZoomInfo email being empty.
// Clay column condition logic (pseudo-syntax matching Clay's UI):
// Run this column when:
// [Apollo_Email] is empty
// AND [ZoomInfo_Email] is empty
// PDL input fields:
// first_name: {{first_name}}
// last_name: {{last_name}}
// company: {{company_name}}
// linkedin_url: {{linkedin_url}} // optional but improves match rate
// PDL output fields to map:
// email (primary work email)
// mobile_phone
// job_title
// job_company_size
// skills (array)
// location_country
PDL cost on Clay: 3–5 credits per enrichment. Incremental coverage on Apollo + ZoomInfo misses: 10–18%. At this point in the waterfall, you have covered approximately 85–90% of a typical B2B list.
Layer 4: Clay web search + Claygent (conditional on PDL miss). For the remaining 10–15% that three structured providers could not match, use Claygent to run a targeted web search. The prompt matters enormously here — be specific.
// Claygent prompt template for email finding:
"Find the professional email address for {{first_name}} {{last_name}},
who works at {{company_name}} as {{job_title}}.
Search their company website ({{company_domain}}) contact or team page,
their LinkedIn profile ({{linkedin_url}}), and any press mentions.
Return ONLY a verified email address in the format name@domain.com.
If you cannot find a verified email with high confidence, return 'not found'.
Do not guess or construct email addresses from name patterns."
// Cost: 10 credits per Claygent run (~$0.10)
// Expected match rate on prior misses: 5-10%
// Latency: 15-40 seconds per row
The “do not guess” instruction is critical. Claygent will construct plausible email addresses (firstname.lastname@company.com) from name patterns if you do not explicitly prohibit it. Those addresses look valid but bounce at 40–60% rates and damage sending domain reputation.
Layer 5: Claude inference for normalization and gap-filling. Even after four layers, you will have rows with partial data — email found but job title missing, headcount field inconsistent across sources, company name formatted differently than your CRM expects. Add a Claude AI column (not Claygent — the simpler Claude text generation column) for normalization.
// Clay Claude AI column prompt for data normalization:
"You are a data normalization assistant. Given the following contact data
from multiple enrichment sources, return a single JSON object with
clean, consistent values.
Input data:
- Apollo job title: {{Apollo_Job_Title}}
- ZoomInfo job title: {{ZoomInfo_Job_Title}}
- PDL job title: {{PDL_Job_Title}}
- Apollo headcount: {{Apollo_Headcount}}
- ZoomInfo headcount: {{ZoomInfo_Employee_Count}}
- Company name variants: {{Apollo_Company}}, {{ZoomInfo_Company}}
Rules:
1. For job title: prefer the most specific title. 'VP of Sales' beats 'Sales'.
2. For headcount: prefer an exact number over a range. If only ranges available,
use the midpoint.
3. For company name: use the legal/formal name, not a shortened version.
4. If a field has no data from any source, return null for that field.
Return JSON only: { job_title, headcount_estimate, company_name_normalized }"
// Cost: 2-3 credits per row (~$0.01-0.02)
// Run condition: always (normalization applies to all rows)
Cost Calculation Per Row
The waterfall’s cost efficiency comes from conditional firing. Here is the math for a 1,000-row table on a Pro plan ($0.006/credit):
- Layer 1 (Apollo): 1,000 rows x 1 credit = 1,000 credits ($6.00). All rows fire.
- Layer 2 (ZoomInfo): 300 Apollo misses x 7 credits = 2,100 credits ($12.60). Fires on ~30% of rows.
- Layer 3 (PDL): 150 ZoomInfo misses x 4 credits = 600 credits ($3.60). Fires on ~15% of rows.
- Layer 4 (Claygent web search): 100 PDL misses x 10 credits = 1,000 credits ($6.00). Fires on ~10% of rows.
- Layer 5 (Claude normalization): 1,000 rows x 2.5 credits = 2,500 credits ($15.00). Fires on all rows.
- Total: 7,200 credits ($43.20) for 1,000 rows = $0.043/row average.
Expected final coverage: 88–92% email found, 95%+ normalized job title and headcount. The remaining 8–12% are genuinely unfindable via public data (private LinkedIn, no company web presence, very new employees).
Where each provider wins: Apollo dominates US SMB and mid-market (75% match rate). ZoomInfo wins on enterprise and Fortune 5000 (30–40% incremental on Apollo misses). PDL wins on technical roles, European contacts, and sub-50-person companies. Claygent wins on founders and executives who have minimal database footprint but active web presence.
Troubleshooting
- High Apollo miss rate (>40%): Your input list likely lacks LinkedIn URLs. Add a LinkedIn URL finder column before Layer 1 — Clay’s LinkedIn URL finder costs 3 credits but improves downstream match rates by 15–25 percentage points.
- Claygent returning guessed emails: Your prompt is not explicit enough. Add the prohibition on pattern-constructed addresses and require a source URL citation in the output. If Claygent cannot cite a source, the email is a guess.
- Claude normalization returning inconsistent JSON: Add a JSON schema validation column after the Claude column. Use Clay’s formula column to check
JSON.parse({{claude_output}})and flag rows that fail to parse. - Table run taking 3+ hours for 1,000 rows: Claygent is the bottleneck (15–40s per row). Reduce Claygent concurrency in table settings or batch overnight. Clay’s parallel execution handles up to 10 concurrent rows by default — increasing this reduces total run time but increases rate limit exposure.
What This Doesn’t Solve
Email deliverability. A waterfall enrichment workflow maximizes the quantity of emails found; it does not validate whether those emails are active and deliverable. Run every output email through a separate email verification step (NeverBounce or ZeroBounce, both have Clay integrations) before loading into your sequencer. Expect 5–15% of found emails to be stale, especially on contacts whose PDL or Apollo record is more than 12 months old.
Mobile phone enrichment. The waterfall above optimizes for work email. Mobile phone enrichment requires different provider prioritization (Cognism wins on EU mobile, Apollo is weaker) and has meaningfully lower coverage rates (30–50% for mobile vs. 85–92% for work email). Build a separate waterfall for mobile if your motion requires it.
Real-time enrichment. This waterfall is designed for batch processing — overnight runs on a static list. For sub-30-second enrichment triggered by a form fill, you need a different architecture: a single-provider API call (Apollo or Clearbit) with no fallback layers, accepting lower coverage in exchange for speed. Clay is not the right tool for real-time enrichment; use a direct API integration via n8n or Make.
Related reading: See the Clay vendor profile for a full breakdown of Clay’s credit pricing tiers, integration depth, and analyst take. For a comparison of Clay’s enrichment approach against building direct provider integrations, see Clay vs. Apollo. The People Data Labs vendor profile covers PDL’s data model and when it outperforms Apollo.