How AI-Assisted Analytics Workflows Actually Work in 2026

How AI analytics copilots — Tableau Pulse, Power BI Copilot, ThoughtSpot Sage, Mode AI — change the analyst's workflow and where they fall short.

S
StackTower AI editorial team

How AI-Assisted Analytics Workflows Actually Work in 2026

The current generation of BI copilots follows a code-generation architecture: a language model translates a natural-language prompt into an executable SQL expression against a governed semantic layer (dbt MetricFlow, Cube.dev, Looker LookML, or AtScale), runs the statement against the connected warehouse, and renders both the generated code and the output together so the operator can inspect and accept or reject the generated logic. Analysts who also use a standalone LLM alongside these copilots for reasoning and concept review can weigh the two leading options — for a structured comparison of these tools see Claude vs. ChatGPT for Learners (2026 Rubric).

That architecture separates today’s copilots from the 2019-2022 natural-language-BI products, which bolted a search box onto a static keyword index and hard-failed on any question outside the index’s curated scope. The current generation compiles against a living semantic layer and expresses uncertainty when confidence is low, rather than returning confidently wrong numbers.

Semantic layer
A metadata model (dbt MetricFlow, Cube.dev, Looker LookML, or AtScale) that encodes business logic, aggregation rules, access controls, and grain definitions on top of raw warehouse tables.
Metric layer
A narrower semantic surface that exposes only pre-defined business metrics (revenue, churn, DAU) rather than raw table schemas, constraining what the AI can retrieve and how it can aggregate.
SpotIQ
ThoughtSpot’s combinatorial analysis engine that runs a large number of aggregation and segmentation passes across all available dimensions and measures, ranks the results by statistical significance, plus feeds output to Sage for natural-language presentation.1
VizQL
Tableau’s proprietary visual expression language, generated at digest time by Tableau Pulse against the subscribed metric’s definition (aggregation type, dimension list, time grain, data source connection).2

Inside Tableau Pulse: Metrics, VizQL, and Digest Generation

Tableau Pulse is a consumption surface, not an authoring surface.2 Before a user can subscribe to a metric, that metric must already exist in the Tableau metric layer with its aggregation type, dimension list, time grain, and data source connection fully specified. The AI generation layer enters only at digest time: it receives the metric name, current value, prior-period value, percentage change, and top contributing dimension breakdowns, then produces a natural-language headline summarizing what moved.

The generated headline is the entry point; clicking through surfaces the full VizQL visualization. What Pulse does not do is generate ad-hoc lookup statements. An operator who wants to investigate a dimension not present in the subscribed metric’s definition must exit Pulse and open a separate authoring surface (Tableau Explore or a connected Mode or Looker session).

This boundary matters for governance: every number that appears in a Pulse digest traces back to a metric definition that a human authored and committed to the metric layer. The AI’s role is translation (metric deltas into prose headlines) rather than statement origination.

Inside Mode AI Visual Explorer: Schema-Aware SQL Generation

Mode’s Visual Explorer operates at a different layer from Pulse.3 It takes a free-form natural-language question, sends the connected warehouse schema (table names, column names, sample row counts, and inferred relationships) as context to the underlying language model, and generates a SELECT statement the operator can review before execution.

The side-by-side code exposure (generated SQL visible in an editable panel alongside the rendered chart) is a deliberate product design choice: Mode’s user base is primarily technical practitioners who already read SQL and want to verify the JOIN conditions and WHERE clauses before trusting the output. When the generated SQL contains a wrong join key or an incorrect filter, the operator edits the statement directly and re-runs.

Three specific patterns where Mode’s SQL generation tends to degrade:

  1. Fan-out JOINs. When joining a many-to-many relationship without an explicit bridge table hint, the model sometimes generates a cross-product that inflates row counts. The generated expression executes without error and returns plausible-looking numbers.
  2. Timezone handling. The model frequently generates date truncations in the warehouse’s default timezone rather than the business timezone specified in the prompt. This produces off-by-one errors in daily or weekly aggregations.
  3. Null semantics in aggregations. Generated AVG() expressions sometimes wrap the column directly rather than applying NULLIF() or COALESCE(), which skews the result when nulls represent “zero” rather than “not applicable” in the source data.

Inside ThoughtSpot Sage and SpotIQ: Combinatorial Analysis as Input

ThoughtSpot Sage routes natural-language questions through SpotIQ, the company’s proprietary combinatorial-analysis engine.1 SpotIQ runs a large number of aggregation and segmentation passes across all dimensions and measures in the connected data model, ranks the findings by statistical significance, and returns the top results. Sage then synthesizes those results into a prose response that cites specific metric values and dimension breakdowns.

The architectural consequence is that Sage inherits SpotIQ’s coverage exactly: dimensions that are modeled appear in Sage’s answers; dimensions that are absent from the model do not. If a causally important attribute (say, sales rep tenure or deal-sourcing channel) is not present in the data model, SpotIQ will not surface it as a contributing factor, and neither will Sage. This is not a failure of the language model; it is a modeling gap that the data platform team must resolve upstream.

The operator-visible surface for reviewing Sage’s reasoning is the “drill” path: clicking on any cited finding in Sage’s response surfaces the underlying execution that produced it. This makes Sage more transparent than most copilots on the finding-provenance dimension.

Inside Power BI Copilot: Report Generation and DAX Synthesis

Power BI Copilot operates on two distinct surfaces with different accuracy profiles.4 On the report canvas, it accepts a natural-language description of a desired visualization and generates the visual, axis labels, title, plus a narrative card explaining what the chart shows. On the semantic model, it generates DAX measures in response to plain-language requests.

DAX is a complex expression language with non-obvious filter-context evaluation rules. Power BI Copilot generates syntactically valid DAX reliably on common patterns:

  1. Year-over-year growth (CALCULATE with SAMEPERIODLASTYEAR)
  2. Rolling averages (AVERAGEX over a DATESINPERIOD range)
  3. Simple ratio measures (DIVIDE with a specified alternate result)

Accuracy degrades on measures that require explicit CALCULATE or REMOVEFILTERS manipulations to override the default evaluation context. A plausible-looking DAX measure can produce systematically wrong numbers at specific slicer combinations without failing visibly at authoring time. The operator must test generated measures against known-good outputs before publishing reports that depend on them.

Operator Posture Changes When Copilots Enter the Stack

Three things shift in how the BI toolchain gets operated once these copilots are in the stack:

  1. SQL review replaces SQL authorship. The operator reads a generated JOIN and checks for double-counting; checks whether a WHERE clause handles nulls correctly; confirms the aggregation grain matches the stated question. Writing the statement is the machine’s job; auditing it is the operator’s job.

  2. Semantic-layer hygiene becomes directly tied to output quality. Stale grain definitions, ambiguous metric names, and missing NULL handling in dbt or Cube.dev metric definitions produce wrong machine-generated SQL that looks correct. Operators who previously could bypass the semantic layer by writing their own SELECT statements now have a direct stake in maintaining it.

  3. Failure mode character changes. A syntax error surfaces immediately. A generated expression that uses the right table but the wrong JOIN key returns plausible numbers that are systematically wrong. Detecting this class of error requires comparing AI output against an independent benchmark rather than relying on the absence of runtime errors.

Capability Gaps in Mid-2026

Four gaps are consistently present across the current generation of BI copilots:

  1. Audit-trail completeness. Most implementations surface the generated SQL expression but do not produce a lineage trail (statement to metric definition to warehouse transformation) that non-technical stakeholders can follow. Vendors have roadmap items for this; none have shipped a complete solution.

  2. Complex-pattern accuracy. Published vendor accuracy benchmarks (typically 80-90% on standard OLAP aggregations) do not represent performance on multi-step CTEs, retention analyses, funnel calculations, or lookup patterns requiring domain context to formulate correctly. The accuracy gap is material on the patterns that matter most for complex business investigations.

  3. Schema drift and stale semantic definitions. When a source-system schema changes (a column renamed, a NULL constraint removed, a new enum value added) the copilot continues compiling expressions against the pre-change schema until the semantic layer is updated. Unlike a hard compile error, the generated expression may execute and return subtly wrong results. Schema-drift monitoring is the semantic-layer operator’s responsibility, not the copilot’s.

  4. Cross-tool session continuity. An investigation that begins in Pulse, continues in Mode, and produces a summary in Power BI Copilot has no shared session state across products. The operator must manually carry context (metric definition verified, contributing factors identified, provenance confirmed) across tool boundaries. Investigation complexity grows the cost of this manual handoff linearly.

A Six-Step Trace: Investigating a Q1 Revenue Shortfall

A reporting professional at a 200-person SaaS company finds Q1 recognized revenue 8% below the operating plan:

Step 1: Verify metric provenance in Tableau Pulse. The operator opens the Pulse digest for the “Recognized Revenue” subscription and checks the metric card’s definition panel: aggregation type SUM, grain daily, filter excludes internal test accounts. The definition matches the plan model. The digest headline reads: “Recognized Revenue missed plan by $412K in Q1; Northeast region accounts for 100% of the variance.”

Step 2: Expand contributing-factor breakdown. The operator clicks the contributing-factor section in the Pulse card. Tableau’s analysis decomposes the miss across geography, product line, and acquisition segment: new-logo bookings in the Northeast are down 22%; expansion revenue in the same region is up 4%; the combined effect produces the net shortfall.

Step 3: Generate a drill-down statement in Mode Visual Explorer. The operator types “show me Northeast new logo bookings by sales rep for Q1 2026 vs. Q1 2025 using the bookings_recognized table.” Mode generates the SQL (GROUP BY on sales_rep_id with a date-range filter) and surfaces it in the side panel. The operator reviews the JOIN between bookings_recognized and dim_sales_rep and confirms it is a correct inner join on sales_rep_id. Output shows three reps with bookings down more than 50% year-over-year.

Step 4: Verify metric provenance in Cube.dev. The operator uses Cube.dev’s AI search: “is bookings_recognized the same source as the new_logo_arr metric in Cube?” The search returns two definitions referencing different source tables: the Pulse metric uses revenue_events with a recognition-date filter; the Mode statement ran against bookings_recognized with a booking-date filter. The operator corrects the Mode statement to use revenue_events and re-runs. The rep-level numbers change materially.

Step 5: Draft the stakeholder summary in Power BI Copilot. The operator asks Copilot to generate a summary visual and narrative. Copilot produces a bar chart with narrative: “Three Northeast sales reps account for $380K of the Q1 recognized-revenue miss, with booking dates concentrated in the final two weeks of the quarter.” The operator adds a manual annotation noting that one rep closed a $180K deal on April 3 (slipped past quarter-end) and revises the narrative framing.

Step 6: Log the data-quality finding. While reviewing the raw revenue_events output, the operator notices two rows with negative recognized amounts (a refund processed in January) that inflate the apparent shortfall by $14K. None of the copilot narratives flagged this. The operator files a data-quality ticket against the ETL refund-handling logic. The copilots produced roughly 70% of the investigation correctly; the operator caught the metric-provenance mismatch (Step 4) and the refund anomaly (Step 6) that required judgment about the data model’s semantics.

FAQ

Do BI copilots replace reporting professionals?

No. These copilots shift effort from SQL authorship and chart production toward semantic-layer governance, output review, and stakeholder communication. The parts of the job requiring business context (what does a 22% booking drop mean for the Q2 pipeline forecast?), data-model knowledge (which metric definition is authoritative?), and communication judgment remain human-performed. Per-person throughput increases; time spent on mechanical statement production drops.

Do machine-generated SQL expressions modify warehouse data?

No. The current generation of BI copilots produces read-only SELECT statements for exploration and reporting. Write-path operations (INSERT, UPDATE, DELETE, dbt model runs, transformation pipeline execution) are outside the scope of all four products covered here. The architectural boundary between read-only expression generation and write-path transformation is explicit in all vendor documentation.

What happens when the semantic layer contains wrong definitions?

The copilot produces an expression that compiles correctly against the layer’s current definition and returns numbers that are wrong relative to business reality. The generated SQL passes syntax validation. The results look plausible. The error surfaces only when the operator compares the output against an independent benchmark and notices a discrepancy. This silent semantic-wrong failure mode is the highest-risk pattern in the current generation of BI copilots.

Which copilot is most transparent about its generated logic?

Mode Visual Explorer exposes the generated SQL in an editable side panel by default, making it the most transparent for practitioners who read SQL. Power BI Copilot exposes generated DAX when creating semantic-model measures but hides expression logic behind report visuals. Tableau Pulse exposes the metric definition (grain, aggregation, filters) but does not surface the VizQL it generates at digest time. ThoughtSpot Sage exposes the underlying execution when the operator drills into a cited finding.


StackTower AI editorial team: BI tooling, copilot architectures, and data platform explainers. This article was written with AI assistance and reviewed by the StackTower AI editorial board.

Written with AI assistance. Content reviewed by the StackTower AI editorial team. Published 2026-05-11.

Footnotes

  1. ThoughtSpot Sage product documentation. https://docs.thoughtspot.com/cloud/latest/sage-overview 2

  2. Tableau Pulse product documentation, Salesforce/Tableau. https://help.tableau.com/current/online/en-us/pulse_create_metrics.htm 2

  3. Mode AI Visual Explorer product documentation. https://mode.com/blog/introducing-mode-ai/

  4. Microsoft Power BI Copilot documentation. https://learn.microsoft.com/en-us/power-bi/create-reports/copilot-introduction

Disclosure · Editorial policy