Bug Investigation Tutorial¶
Systematically investigate bugs using rp1's hypothesis-driven debugging workflow. This tutorial walks you through diagnosing a production issue from symptoms to root cause.
Time to complete: ~20-30 minutes
What You'll Learn¶
- How rp1's bug investigation command works
- The hypothesis-driven approach to debugging
- Evidence gathering and root cause analysis
- Using knowledge base context for codebase understanding
- Interpreting investigation reports
Prerequisites¶
Before You Begin
- rp1 installed (Installation)
- Knowledge base generated (
/knowledge-build) - A bug or issue you want to investigate
The Scenario¶
We'll investigate an API timeout issue in a web application. This example was chosen because:
- Timeout bugs are common in production systems
- They require understanding multiple layers (network, database, application)
- The root cause often isn't obvious from the symptoms
Symptoms: Users report that the /api/orders endpoint intermittently returns 504 Gateway Timeout errors, typically during peak hours.
The Investigation Workflow¶
flowchart LR
S[Symptoms] --> H[Hypotheses]
H --> E[Evidence]
E --> A[Analysis]
A --> R[Root Cause]
| Phase | Purpose | Output |
|---|---|---|
| Symptoms | Define what you're seeing | Problem description |
| Hypotheses | Generate possible causes | Ranked hypothesis list |
| Evidence | Gather supporting data | Code analysis, logs, patterns |
| Analysis | Evaluate each hypothesis | Confirmation or elimination |
| Root Cause | Identify the actual cause | Investigation report |
Step 1: Start the Investigation¶
Launch the bug investigator with a description of the issue:
What happens:
rp1's bug investigator agent:
- Loads your knowledge base context
- Analyzes the symptom description
- Searches your codebase for relevant code
- Generates initial hypotheses
What to expect:
๐ Bug Investigation Started
Problem: API timeout on /api/orders endpoint during peak hours
Loading knowledge base context...
โ Architecture understood
โ Relevant modules identified
Searching codebase for relevant code...
โ Found: src/api/orders.ts
โ Found: src/services/orderService.ts
โ Found: src/db/queries/orderQueries.ts
Generating hypotheses...
Checkpoint
The investigator should identify files related to the orders endpoint. If it's searching in wrong locations, rebuild your knowledge base with /knowledge-build.
Step 2: Review Hypotheses¶
The investigator generates ranked hypotheses based on your codebase:
What to expect:
๐ Hypotheses Generated (4)
H1 [HIGH]: Database query N+1 problem
- Evidence: orderQueries.ts uses sequential queries in a loop
- Confidence: 75%
- Would explain: Load-dependent timeouts
H2 [MEDIUM]: Missing database index
- Evidence: orders table queried by customer_id without index check
- Confidence: 60%
- Would explain: Slow queries during high volume
H3 [MEDIUM]: Connection pool exhaustion
- Evidence: Pool size set to 10, no queue configuration
- Confidence: 55%
- Would explain: Intermittent failures under load
H4 [LOW]: External API call blocking
- Evidence: orderService calls inventory API synchronously
- Confidence: 40%
- Would explain: Variable response times
Understanding confidence scores:
- 75%+: Strong evidence in codebase, likely cause
- 50-74%: Supporting evidence, needs verification
- Below 50%: Possible but less likely given available evidence
Step 3: Evidence Gathering¶
The investigator gathers evidence for each hypothesis by examining your code:
What to expect:
๐ Gathering Evidence for H1: Database query N+1 problem
Analyzing: src/db/queries/orderQueries.ts
Found pattern at lines 45-52:
```typescript
async function getOrdersWithItems(customerIds: string[]) {
const orders = await db.query('SELECT * FROM orders WHERE customer_id IN (?)', [customerIds]);
// N+1 pattern detected: loop with individual queries
for (const order of orders) {
order.items = await db.query('SELECT * FROM order_items WHERE order_id = ?', [order.id]);
}
return orders;
}
```
Evidence strength: STRONG
- Sequential queries inside loop (line 48-50)
- No batch loading of order items
- Query count = 1 + N (where N = number of orders)
How Evidence is Gathered
The investigator uses your knowledge base to understand your architecture, then performs targeted code searches. It examines actual code patterns, not just file names.
Step 4: Hypothesis Testing¶
The investigator systematically tests each hypothesis:
What to expect:
๐งช Testing Hypotheses
H1 [Database N+1]: CONFIRMED โ
- Code shows sequential queries in loop
- 100 orders = 101 database queries
- Peak hours have ~500 orders/minute
- Estimated query load: 50,000 queries/minute at peak
H2 [Missing Index]: PARTIAL โ
- Index exists on customer_id (found in migrations)
- But compound index (customer_id, created_at) missing
- May contribute but not primary cause
H3 [Connection Pool]: UNLIKELY โ
- Pool size is 10, but connection time is fast
- Pool would reject, not timeout
- Error pattern doesn't match pool exhaustion
H4 [External API]: ELIMINATED โ
- Inventory API call is after order fetch
- Timeout occurs before this code path
- Confirmed via error timestamps
Step 5: Root Cause Report¶
The investigation concludes with a comprehensive report:
What to expect:
๐ Investigation Report
Problem: API timeout on /api/orders during peak hours
ROOT CAUSE: Database N+1 Query Pattern
Location: src/db/queries/orderQueries.ts:45-52
Summary:
The getOrdersWithItems function executes 1 query to fetch orders,
then N additional queries to fetch items for each order. During peak
hours (~500 orders/minute), this generates ~50,000 queries/minute,
overwhelming the database connection and causing timeouts.
Contributing Factor:
Missing compound index on (customer_id, created_at) causes table scans
on the initial orders query.
Recommended Fixes:
1. PRIMARY: Refactor to batch query
```sql
SELECT * FROM order_items WHERE order_id IN (?, ?, ...)
```
2. SECONDARY: Add compound index
```sql
CREATE INDEX idx_orders_customer_date ON orders(customer_id, created_at);
```
Verification Steps:
1. Add query logging to confirm query count
2. Test with production-like data volume
3. Monitor response times after fix
Files to Modify:
- src/db/queries/orderQueries.ts (primary fix)
- migrations/xxx_add_compound_index.sql (secondary fix)
Output: .rp1/work/investigations/api-timeout-orders/report.md
Checkpoint
The investigation report should identify a clear root cause with specific file locations and line numbers. If the report is vague, provide more context about your symptoms.
Acting on Findings¶
After the investigation, you have several options. Always reference the investigation report to give the agent full context.
Option A: Quick Fix¶
If the fix is straightforward, implement it directly:
Option B: Formal Feature¶
For larger fixes, use the feature workflow:
Option C: Further Investigation¶
If the root cause isn't clear, investigate specific areas:
/code-investigate "Deep dive into connection pool behavior" \
.rp1/work/investigations/api-timeout-orders/report.md
Option D: Deep Research¶
If the issue spans multiple repositories, external services, or requires broader research:
/deep-research "How does payment service interact with orders API?" \
.rp1/work/investigations/api-timeout-orders/report.md
This is useful for cross-service communication, third-party integrations, or understanding external systems. See Deep Research for more details.
Summary¶
You've learned the bug investigation workflow:
| Phase | What Happens | Key Insight |
|---|---|---|
| Start | Describe symptoms | Be specific about when/where |
| Hypotheses | Agent generates theories | Ranked by confidence |
| Evidence | Code analysis | Actual patterns found |
| Testing | Evaluate each hypothesis | Confirm or eliminate |
| Report | Root cause identified | Actionable fix paths |
Key Benefits¶
- Systematic - No guessing, evidence-based analysis
- Context-aware - Uses your KB to understand architecture
- Documented - Full report for team review
- Actionable - Specific files and line numbers
Next Steps¶
- Fix the bug: Use build-fast for simple fixes
- Larger fixes: Use the Feature Development workflow
- Review the PR: After fixing, use PR Review to verify the change
- Reference: See code-investigate for full command options
Troubleshooting¶
Investigation isn't finding relevant files
Your knowledge base may be outdated. Regenerate it:
Or provide more specific context in your problem description.
Hypotheses don't match my intuition
The investigator generates hypotheses based on code evidence. If you suspect something specific, include it:
Report suggests multiple possible causes
This is common for complex bugs. The report ranks causes by confidence. Start with the highest-confidence fix and verify if it resolves the issue.
Can I investigate without a knowledge base?
Yes, but results will be less accurate. The KB provides architectural context that helps the investigator understand your codebase patterns.