Pipeline Best Practices

Overview

Well-designed Pipelines are the foundation of reliable agent behavior and trustworthy HITL dashboards. This page covers the patterns that lead to clean, maintainable data infrastructure — and the mistakes to avoid.

Design Principles

1. Single Responsibility per Pipeline

Each Pipeline should extract and transform one logical dataset. Avoid trying to produce unrelated outputs from a single pipeline. Good:

HubSpot Deals Over Threshold — pulls deals, filters by amount, produces deals_over_10k
QuickBooks Expense Summary — pulls expenses, aggregates by category, produces expense_totals

Avoid:

CRM and Finance Data — pulls deals AND expenses AND contacts into one pipeline with mixed outputs

Keeping pipelines focused makes them easier to debug, schedule independently, and reuse across different agents and experiences.

2. Use Descriptive Outcome Names

Pipeline Outcomes are referenced by name across Agents and Experiences. Choose names that are specific and self-documenting.

❌ Avoid	✅ Better
`output1`	`deals_over_10k`
`data`	`aggregated_rd_expenses_by_project`
`results`	`customer_support_tickets_open`

The name should convey what the data is, not just that it exists.

3. Match Schedule to Data Change Frequency

Over-scheduling wastes connector quota and compute. Under-scheduling means agents work with stale data. Ask: How often does the underlying data actually change in a meaningful way?

Data Type	Recommended Frequency
Financial totals, reporting data	Daily
CRM records, deal pipelines	Hourly
Support tickets, incident data	Hourly
Weekly summaries, payroll data	Weekly
Historical archives	Manual

If you’re unsure, start with Daily and adjust based on how stale the data looks in your agent outputs and experience dashboards.

4. Test Before Activating

Always run a Test Run on a Draft pipeline before clicking Activate. This catches:

Connector credential issues
Schema mismatches between expected and actual data
Empty results from misconfigured filters
Transformation errors

Check the Data tab after the test run to confirm records appear with the expected structure.

5. Monitor Pipeline Health Regularly

Check the Pipelines Overview dashboard weekly (or set up alerting via Tools). Watch for:

Success rate dropping below 100% — usually indicates a connector issue or upstream schema change
Duration increasing significantly — may signal that the data volume has grown unexpectedly
Auto-pause — a pipeline that auto-paused due to 3 consecutive failures needs immediate attention

If a pipeline feeding a critical agent or dashboard auto-pauses, the agent will be working with stale data. Set the Max Consecutive Failures setting to match your tolerance — but don’t set it so high that silent failures go unnoticed for long.

6. Write Clear Pipeline Descriptions

The natural language description you enter in Step 3 of the wizard does two things: it generates the workflow AND serves as documentation for your team. Write descriptions that are:

Specific about data sources: “Pull all deals from HubSpot” not “Get deals”
Explicit about filters: “Filter to only include deals with amount > $10,000” not “Filter large deals”
Clear about transformations: “Aggregate by category, summing the amount field” not “Group the data”
Descriptive about outputs: “Store as a table named deals_over_10k with columns: deal_name, amount, close_date”

Good descriptions also make it easier to regenerate the workflow if you need to change the pipeline later.

7. Document Transformation Logic

When pipelines involve complex transforms (multi-step aggregations, conditional logic, joins), add notes to each workflow node explaining why the transform exists, not just what it does. Use the notes field in node configuration:

notes: "Filter to R&D-tagged expenses only. The 'tag' field uses
a free-text convention — check for both 'RD' and 'R&D' spellings."

This documentation is invaluable when you or a colleague needs to debug the pipeline months later.

8. Coordinate Pipeline and Agent Schedules

If an agent runs on a schedule (e.g., daily at 10 AM), ensure the Pipeline it depends on runs before the agent (e.g., daily at 8 AM). An agent that runs before its Pipeline has refreshed is working with yesterday’s data. Pattern:

00 AM — Pipeline: QuickBooks Expense Aggregation runs
00 AM — Pipeline: HubSpot Deal Sync runs
00 AM — Agent: Finance Review Agent runs (reads from both pipeline outcomes)

9. Don’t Over-Extract

Extract only the fields and records your agents and experiences actually need. Over-extracting creates larger outcome tables, slower runs, and more cognitive overhead when inspecting data. If your agent only needs deal_name and amount, don’t extract 40 fields from your CRM.

10. Use the Regenerate Feature to Iterate

If the AI-generated workflow doesn’t match your intent exactly, don’t manually edit individual nodes — instead, refine the pipeline description and click Regenerate. This produces a cleaner, more consistent workflow than patching individual nodes by hand.

Common Patterns

Pattern 1: Extract → Filter → Aggregate → Write

The most common pipeline shape. Extract all records, filter to the relevant subset, aggregate to a summary, write to the outcome table.

READ DB: Pull all expense records
   ↓
FILTER: Include only records where amount > 500
   ↓
GROUP: Aggregate by category, sum amount
   ↓
WRITE: Store as aggregated_expenses

Pattern 2: Extract → Transform → Write (No Filter)

For pipelines where all records are relevant but need reshaping before agents can use them.

TOOL: Fetch all contacts from HubSpot CRM
   ↓
TRANSFORM: Extract and normalize contact fields
   ↓
TRANSFORM: Flatten nested address fields
   ↓
WRITE: Store as normalized_contacts

Pattern 3: Multi-Step Risk Scoring

For pipelines that compute derived values before writing.

READ DB: Pull all open deals
   ↓
FILTER: Deals over $10K
   ↓
TRANSFORM: Calculate days until close
   ↓
TRANSFORM: Assign risk tier (High/Medium/Low) based on days and amount
   ↓
WRITE: Store as deals_with_risk_score

​Overview

​Design Principles

​1. Single Responsibility per Pipeline

​2. Use Descriptive Outcome Names

​3. Match Schedule to Data Change Frequency

​4. Test Before Activating

​5. Monitor Pipeline Health Regularly

​6. Write Clear Pipeline Descriptions

​7. Document Transformation Logic

​8. Coordinate Pipeline and Agent Schedules

​9. Don’t Over-Extract

​10. Use the Regenerate Feature to Iterate

​Common Patterns

​Pattern 1: Extract → Filter → Aggregate → Write

​Pattern 2: Extract → Transform → Write (No Filter)

​Pattern 3: Multi-Step Risk Scoring

Overview

Design Principles

1. Single Responsibility per Pipeline

2. Use Descriptive Outcome Names

3. Match Schedule to Data Change Frequency

4. Test Before Activating

5. Monitor Pipeline Health Regularly

6. Write Clear Pipeline Descriptions

7. Document Transformation Logic

8. Coordinate Pipeline and Agent Schedules

9. Don’t Over-Extract

10. Use the Regenerate Feature to Iterate

Common Patterns

Pattern 1: Extract → Filter → Aggregate → Write

Pattern 2: Extract → Transform → Write (No Filter)

Pattern 3: Multi-Step Risk Scoring