Overview
Moby, Triple Whale’s conversational AI and SQL co‑pilot, can dramatically speed up analysis and reporting. However, like all large‑language‑model (LLM) systems, it can occasionally return incomplete, outdated, or entirely fabricated information (a phenomenon known as hallucination). This article explains why those limitations exist, how to recognise them, and best‑practice steps every team member must follow to validate Moby’s output before sharing it with customers or stakeholders.
Bottom line: Moby is a power tool, not an oracle. Pair its speed with your human judgment.
What Moby Can Do Reliably
Task | Why It Works Well |
Generate starter SQL queries for Triple Whale’s unified warehouse | Moby is fine‑tuned on our schema and naming conventions. |
Summarise known metrics (e.g., ROAS, CPA, MER) over common date ranges | These values are fetched directly from live data via validated connectors. |
Suggest prompt improvements for Agents | Moby’s language strengths excel at rewriting and structuring text. |
Explain Triple Whale concepts (Pixel, Attribution Models, Sonar, etc.) | We continuously feed the knowledge base with official documentation. |
Even in these areas, always scan for edge‑case errors before publication.
Known Limitations & Failure Modes
Hallucinated Facts – May invent metric definitions, feature names, or API endpoints that sound plausible.
Out‑of‑Date Knowledge – Model weights freeze at periodic checkpoints; anything launched in the last ~45 days may be missing or partially learned.
Context Window Overflows – Very long SQL queries, huge result sets, or chat threads can push crucial details out of scope, leading to contradictory answers.
Math & Aggregation Errors – Percentages may not add up; division‑by‑zero shortcuts; rounding mistakes when token limits clip digits.
Mis‑Applied Attribution Models – Defaults to Triple Attribution unless explicitly told otherwise; dashboards might use Linear or Total Impact and appear to “disagree.”
Ambiguous Pronouns / References – If several campaigns share similar names, Moby can merge or confuse them.
Tone Leakage – May unintentionally mimic customer language (including profanity) when analysing tickets.
Red Flag Words: “approximately,” “appears to,” “likely,” “as far as I can tell.” Treat these as prompts to double‑check.
The Moby Fact‑Checking Workflow
Follow this checklist every time you plan to share Moby‑generated content externally or use it to drive product decisions.
Surface the Sources
• Ask: “Cite every table or field you used.”
• Enable the Show SQL toggle in the Agent builder.Replicate Key Metrics in the UI
• Open the corresponding Triple Whale dashboard with the same date range.
• Confirm ROAS, Spend, Revenue, etc., match within ±0.1%.Cross‑Check With Secondary Queries
• Run a pared‑down SQL query (e.g., one metric, one channel) to spot discrepancies quickly.Validate Business Logic
• Ensure attribution model, currency, and timezone align with stakeholder expectations.
• Ask Moby to explain its reasoning in plain English.
Prompt Like a Pro with M.O.B.Y.
Use our branded framework every time you speak to Moby. Each letter reminds you to shape the question so the model can land an accurate, actionable answer.
Letter | Principle | Ask Yourself | Quick Check |
M | Measurable | “Have I specified numerical fields or KPIs?” | Concrete metrics such as order revenue, CPA, MER, net profit. |
O | Obtainable | “Is this calculation possible with data Moby can access today?” | Keep horizons & aggregations within our warehouse’s current scope. |
B | Bounded | “Did I define a clear date range or other limit?” | Examples: Q4 2024, last 30 days, top 10 campaigns. |
Y | Yielding | “Will the answer directly inform a decision?” | Target slices like top 5 campaigns by spend or worst 10 SKUs by ROAS. |
‘Instead‑of / Try’ Cheat‑Sheet
Principle | Vague Prompt (✖) | Better Prompt (✔) |
M — Measurable | “Tell me about sales trends.” | “Analyse order revenue and net profit for Q4 2024 by region.” |
O — Obtainable | “Predict our total revenue for the next decade.” | “Using historical monthly revenue, forecast the next three months’ revenue trends.” |
B — Bounded | “Give me an overview of our performance.” | “Summarise customer‑retention and engagement metrics for the past six months.” |
Y — Yielding | “What’s going on with our website performance?” | “Evaluate website performance by comparing conversion rate, bounce rate, and average session duration between the first and last month of Q4 2024.” |
One‑Line Template
Using [dataset] between [start‑date] and [end‑date], calculate [measurable metric(s)] for [specific segment], and highlight [desired outcome].
Run your draft through the M.O.B.Y. checklist—if it satisfies all four letters, hit Enter.
Reporting Bugs or Inaccuracies
Capture a screenshot of the erroneous output and the underlying SQL (if applicable).
Email us at research-feedback@triplewhale.com.
Include: workspace URL, date range, exact prompt, expected vs. actual result.
Frequently Asked Questions
1. Does Moby ever write directly to production data?
Never. It operates read‑only via service accounts.
2. Can I rely on Moby for P&L calculations?
Only after running the fact‑checking workflow above—especially step 4 regarding currency and COGS completeness.
3. How big a dataset can Moby handle?
Rough guideline: ≤12 000 rows or ≤80 K tokens per conversation. Beyond that, segment the query or narrow the date range.
4. What should I do if Moby contradicts itself?
Clear chat context, restate the prompt with explicit parameters, and compare results. If inconsistency persists, escalate to the Data team.