Beyond the Model: What Most AI Implementations Miss About Enterprise Intelligence

Richard Wilding
Richard Wilding
March 5, 2026
5-minute read
Beyond the Model: What Most AI Implementations Miss About Enterprise Intelligence

A system that looks intelligent and a system that is reliable are not the same thing — especially when you cannot tell them apart.

This difference rarely matters in casual use. It matters enormously when the output informs real business decisions.

Consider a simple example. A corporate innovation team runs a competitive analysis on a relatively new company they are considering partnering with. They delegate the task to an AI-powered workflow. Within minutes, it returns a polished report: messaging themes, positioning analysis, market perception, strategic recommendations. The output is fluent, well-structured, and confident.

It is also wrong from the very first step.

The company name had been used by a different organization about a decade earlier. That older company had been acquired but left behind a large digital footprint: press coverage, archived product pages, conference talks, and commentary. The newer company, by contrast, had a sparse website and minimal public presence. Faced with this imbalance, the system gravitated toward the richer dataset. There was simply more to find. From the system's perspective, this looked like success: results were plentiful, synthesis could proceed, nothing explicitly failed.

But every subsequent stage – data collection, messaging analysis, competitive positioning, conclusions – was now applied to a company that no longer existed in the relevant form. The result was coherent and professional-looking. It was also completely wrong. Because the error occurred at the level of identity selection, no amount of downstream quality could recover correctness.

A human analyst encountering the same situation would notice the mismatch in timelines, question why an apparently new company had a decade of historical content, and look for confirming signals: incorporation dates, current products, leadership continuity. They would recognize that abundance of data is not evidence of correctness. If uncertainty remained, they would pause and seek clarification before proceeding.

The AI system did none of this, not because the underlying model lacked capability, but because no one had designed the surrounding process to ask those questions. And this is just one of many silent failure modes where AI completes the task it was given, produces a professional-looking deliverable, and still misses the mark, because the methodology governing how it approached the work was flawed from the start.

The distinction that matters

The failure above is not a model failure. It is a system failure. Most of what makes AI useful — or unreliable — in professional contexts does not live in the model. It lives in the system around it.

When people reference tools like ChatGPT or Claude, they tend to attribute the observed intelligence to the model itself. A common assumption follows: as models improve, today’s shortcomings will simply disappear. This framing treats model capability as the binding constraint on every problem, when in practice most of the limitations organizations encounter have little to do with the model at all. In reality, what users interact with is a composite system: an orchestration layer responsible for planning, tool invocation, data retrieval, validation, formatting, and constraint enforcement, wrapped around a model that primarily performs language understanding and generation.

This distinction is not academic. It determines whether the output can be trusted.

When a system retrieves financial data, constructs tables, cites sources, or decides to branch into a follow-up search, these are not spontaneous properties of the model. They are the result of explicit design decisions — decisions that are invisible to the end user. Every chat-based AI tool makes choices about how to search, what to retrieve, how much context to retain, and how to structure its response. These choices shape the output as much as the model's own reasoning does, and the user typically has no visibility into any of them. The companies behind these tools have made their own design and process decisions – how to search, what to prioritize, and when to stop. Those decisions may or may not align with how your organization would approach the same task.

You are not just using a model.

You are deferring to someone else's methodology, whether you realize it or not.

Consider a basic constraint most users never think about. When a chat conversation gets long enough, the system quietly drops earlier context to stay within its processing limits. It re-reads the entire conversation history every time you send a message, and at some point, begins discarding information it decides is no longer a priority. Your analyst doesn't forget the first half of the brief because it got too long. This is one example of an invisible system constraint that shapes output quality independent of model capability.

Models are genuinely improving. That is not the issue. What this paper is arguing is that the visible successes mask invisible assumptions that matter enormously in professional contexts. When fluent outputs and successful demos are treated as evidence that the hard problems are solved, attention shifts away from system design and methodology. Complexity does not disappear; it simply moves out of view.

Where things quietly break

To see why this matters in practice, consider what happens when an organization uses AI for competitive analysis — not as a one-off question, but as part of a structured process that informs strategy, budgets, and priorities.

The wrong-target problem. The entity confusion described above is not an edge case. Any company with an ambiguous name, a recent rebrand, or a limited digital footprint is vulnerable. The system's default behavior is to follow the richest data trail, which often means the most visible entity rather than the correct one. The output reads like a thorough analysis of the right question — about the wrong company. This failure mode is especially dangerous precisely because it is invisible in the final deliverable.

Shallow collection disguised as research. Even when the correct entity is identified, speed bias shows up in what happens next. A system runs a search, retrieves a press release, a generic "About Us" page, and a two-year-old blog post. It declares data collection complete and moves to synthesis. The output reads well, but anyone with domain experience would recognize that the signal is weak: the material is shallow, dated, and reveals almost nothing about how the company actually positions its products or frames its strategy. A human analyst would not stop there. They would explore the company's website in depth, use targeted searches to surface relevant pages, actively judge relevance, discard superficial content, and continue until coverage felt sufficient, not until a default result limit was exhausted.

Representative sampling matters. A chatbot might collect a handful of Reddit posts or social media comments and summarize them as if they reflect the broader market. The narrative may sound authoritative, but the evidence is fragile. Twelve posts cannot represent the range of sentiment in a market, and a different twelve posts selected on a different day might tell a different story.

The gap is not about effort or volume; increasing sample size is easy. The issue is whether the system understands that data gathering is an evaluative act requiring judgment about sufficiency, not simply executing a retrieval step.

What counts as “sufficient” is specific to your organization — your standards, your stakeholders’ expectations, and the decisions the analysis is meant to inform. A threshold suitable for an internal briefing is not the same as one required for a board-level recommendation. No general-purpose tool can know that standard unless someone encodes it into the process.

Non-determinism across runs. Given the same input, an LLM-driven workflow may produce outputs that vary significantly from one execution to the next. One run might emphasize messaging and tone; another might focus on product features; a third might omit entire dimensions of analysis. For casual exploration, this variability feels like creativity. For an organization paying for a consistent methodology — a way of answering, not just an answer — it is a liability. A report that changes structure, emphasis, or analytical framing from run to run undermines trust and makes downstream consumption harder, not easier.

Asymmetric comparisons. Now imagine the system is comparing four competitors. It collects website content, social media data, and analyst commentary for three of them. The fourth has no social media presence and minimal coverage. A simplistic workflow proceeds regardless: it generates whatever output it can and presents the result alongside the others. Three competitors analyzed in depth, one barely sketched — but the system does not flag this asymmetry or adjust. A human analyst would stop and ask: does this company still belong in the comparison set? Can the missing data be sourced elsewhere? Should a replacement be identified? If so, does the analysis need to be rerun from the start to preserve like-for-like comparability? This is not a minor procedural point. It is the moment where analytical integrity is either maintained or quietly abandoned.

These failure modes share a common trait: the output still looks good. The prose is fluent, the formatting is clean, the confidence is high. The errors are not in the language. They are in the invisible decisions that preceded it — what was collected, what was compared, what was assumed, and what was silently omitted.

Why structure is the answer

If the problem lives in the system, the solution does too. That solution is structure – and this is where most AI conversations go wrong.

The instinct many organizations have at this point is to improve their prompts: give the AI more detailed instructions, more context, a better brief. And this helps. More sophisticated prompting genuinely produces more consistent, higher-quality outputs. It is a meaningful step beyond naive usage.

But it does not solve the underlying problem. However well-crafted the prompt, the model is still probabilistic. The orchestration layer is still opaque. And the system still lacks the persistent state, evaluative judgment, and methodological discipline that professional analytical work requires.

Some teams go further and build their own automated workflows, chaining AI calls together in tools designed for that purpose. This represents real investment and real progress. But it also introduces a challenge that is easy to underestimate: building a workflow that works in a demo is not the same as building a system that works reliably in production. Edge cases, error handling, state management, and graceful degradation when data is missing are all software engineering problems. They require software engineering discipline. The fact that the building blocks are AI calls rather than traditional code does not make the surrounding system design any less critical. If anything, the non-determinism of the components makes it more so.

Organizations that extract durable value from AI-powered analysis impose structure at every layer. In practice, this means codifying the analytical methodologies that already exist implicitly in experienced teams. Over years of work, corporate teams, just like agencies and consultancies, develop standard approaches to competitive intelligence, market analysis, and strategic research. These approaches define not just what data to examine, but how to examine it: which questions to ask, how to structure outputs, how to communicate findings, and how to handle gaps and ambiguity.

Encoding these methodologies into software involves a mix of conventional programming and LLM-based components. Some steps are purely procedural – data collection, filtering, deduplication, categorization. Others rely on the model's ability to reason, summarize, or interpret. Crucially, the system constrains the model's role. It does not decide whether a competitive analysis includes pricing, positioning, or tone. Those decisions are already embedded in the playbook. The model operates within a defined scope, producing outputs in specific formats, using consistent language, addressing specific analytical dimensions of the problem.

This is not a reduction of intelligence. It reflects how human expertise actually works. Experienced analysts do not reinvent their methodology from scratch for every project. Their skill lies in selecting the right approach for a given context and executing it consistently — not in arbitrarily varying their process each time. Structured AI systems operate the same way: agency at the level of planning and routing, discipline at the level of execution.

And structure does not eliminate the model's capacity for novel insight. It creates conditions where such insights can be reliably captured, evaluated, and acted upon — rather than appearing and disappearing randomly across runs.

What good looks like

The difference between a system that generates text and one that supports real decisions comes down to three capabilities.

Traceability. When an end recipient challenges a claim (e.g.  "competitor messaging focuses on cost reduction") a well-designed system can show the evidence trail: how many data points were collected, how they were categorized, what representative examples look like within each category, and how the conclusion was derived. A weak system can only restate the claim in different words. This is the difference between output and decision support. The value is not just accountability; it is that humans can interrogate conclusions, challenge them, and build on them with confidence.

Regeneration from source, not surface editing. Professional contexts often require the same analysis to be reframed through a different lens, e.g. shifting from competitive positioning to technology evaluation, from strategic overview to investor briefing. A naive approach takes the final text and rewrites it, swapping vocabulary and adjusting emphasis. But conclusions that made sense in a positioning context may not hold in a technical one. A more rigorous approach propagates the reframing through the entire analytical pipeline, regenerating intermediate analyses where necessary and ensuring that the questions asked of the data change when the lens changes. The rewrite should be a genuine reanalysis, not a veneer over a misaligned foundation.

Consistent presentation as part of the analytical contract. Even when AI systems produce well-formatted outputs, consistency across reports and across time still matters. A monthly competitive update should feel like a continuation, not a reinvention. End recipients should not have to re-learn the structure of a deliverable each time they receive one.

In professional settings, presentation is part of the analytical contract. When structure, framing, and layout change from run to run, it creates friction for the people consuming the work. It also makes it difficult to make comparisons and track patterns over time. Treating presentation as a first-class component of the system — not something left to generative variability — reinforces the broader point: reliability emerges from disciplined structure, not from unbounded generation.

What this means for your organization

Most organizations are already using AI for research, analysis, or strategic input. The question is no longer whether AI is capable enough. The question is whether you understand the system shaping your outputs.

Every AI tool your team uses makes invisible decisions about how to search, what to retrieve, how much context to retain, and how to structure its response. These decisions constitute a methodology — one you did not choose, cannot inspect, and have no guarantee is aligned with how your organization thinks about the problem.

Most organizations build structured methodologies for research, innovation, and competitive analysis precisely to counteract bias, gut feel, and strategic inertia. These frameworks exist to enforce discipline in how evidence is gathered and interpreted. Introducing AI-generated outputs into these processes without holding them to the same standards creates a new vulnerability. An answer may sound well-reasoned yet be based on insufficient sampling, misidentified entities, or hidden methodological assumptions. When such outputs enter reports or decision-making forums unchallenged, they can quietly distort direction. The danger is not obvious error, rather it is a gradual misalignment.

For internal strategy teams, this plays out in planning conversations. When an AI-generated analysis is presented and looks thorough and well-sourced, no one in the room has reason to question it. Budgets shift. Priorities adjust. The cost rarely appears as dramatic failure. It appears as wasted effort, flawed prioritization, and opportunity cost that only becomes visible later.

For consultancies and agencies, the implications are sharper still. Your clients are paying for judgment — contextualized, structured, and defensible. That judgment is embedded in your process. If analysis is delegated to a tool that does not reflect your methodology, you are substituting your intellectual framework with a commodity layer. Over time, this erodes differentiation. If the underlying method is indistinguishable from what anyone else can access, the insight becomes indistinguishable too.

The organizations that extract durable value from AI will not be those chasing the most fluent outputs. They are those that design systems where structure, validation, and transparency are first-class concerns, where intelligence is not just generated, but disciplined.

And that leads to a deeper risk.

If you do not understand the method shaping an output, you are outsourcing judgment.

And in business, outsourced judgment always carries consequences.

*********

Co-Created is a venture studio that helps organizations design and build structured AI systems for research, competitive intelligence, and strategic analysis. We work with companies that have outgrown chatbot-level AI usage and need systems that reflect their own methodology, not someone else's.

Share this post
Richard Wilding
Richard Wilding
March 5, 2026
5 min read

Sign up to our newsletter

Stay up to date on our latest updates, insights, and musings.

Insights

More from Category

Read more from the Co-Created team below.

A system that looks intelligent and a system that is reliable are not the same thing — especially when you cannot tell them apart. That matters a lot when the output informs real business decisions.
Richard Wilding
Richard Wilding
March 5, 2026
5 min read

A system that looks intelligent and a system that is reliable are not the same thing — especially when you cannot tell them apart.

This difference rarely matters in casual use. It matters enormously when the output informs real business decisions.

Consider a simple example. A corporate innovation team runs a competitive analysis on a relatively new company they are considering partnering with. They delegate the task to an AI-powered workflow. Within minutes, it returns a polished report: messaging themes, positioning analysis, market perception, strategic recommendations. The output is fluent, well-structured, and confident.

It is also wrong from the very first step.

The company name had been used by a different organization about a decade earlier. That older company had been acquired but left behind a large digital footprint: press coverage, archived product pages, conference talks, and commentary. The newer company, by contrast, had a sparse website and minimal public presence. Faced with this imbalance, the system gravitated toward the richer dataset. There was simply more to find. From the system's perspective, this looked like success: results were plentiful, synthesis could proceed, nothing explicitly failed.

But every subsequent stage – data collection, messaging analysis, competitive positioning, conclusions – was now applied to a company that no longer existed in the relevant form. The result was coherent and professional-looking. It was also completely wrong. Because the error occurred at the level of identity selection, no amount of downstream quality could recover correctness.

A human analyst encountering the same situation would notice the mismatch in timelines, question why an apparently new company had a decade of historical content, and look for confirming signals: incorporation dates, current products, leadership continuity. They would recognize that abundance of data is not evidence of correctness. If uncertainty remained, they would pause and seek clarification before proceeding.

The AI system did none of this, not because the underlying model lacked capability, but because no one had designed the surrounding process to ask those questions. And this is just one of many silent failure modes where AI completes the task it was given, produces a professional-looking deliverable, and still misses the mark, because the methodology governing how it approached the work was flawed from the start.

The distinction that matters

The failure above is not a model failure. It is a system failure. Most of what makes AI useful — or unreliable — in professional contexts does not live in the model. It lives in the system around it.

When people reference tools like ChatGPT or Claude, they tend to attribute the observed intelligence to the model itself. A common assumption follows: as models improve, today’s shortcomings will simply disappear. This framing treats model capability as the binding constraint on every problem, when in practice most of the limitations organizations encounter have little to do with the model at all. In reality, what users interact with is a composite system: an orchestration layer responsible for planning, tool invocation, data retrieval, validation, formatting, and constraint enforcement, wrapped around a model that primarily performs language understanding and generation.

This distinction is not academic. It determines whether the output can be trusted.

When a system retrieves financial data, constructs tables, cites sources, or decides to branch into a follow-up search, these are not spontaneous properties of the model. They are the result of explicit design decisions — decisions that are invisible to the end user. Every chat-based AI tool makes choices about how to search, what to retrieve, how much context to retain, and how to structure its response. These choices shape the output as much as the model's own reasoning does, and the user typically has no visibility into any of them. The companies behind these tools have made their own design and process decisions – how to search, what to prioritize, and when to stop. Those decisions may or may not align with how your organization would approach the same task.

You are not just using a model.

You are deferring to someone else's methodology, whether you realize it or not.

Consider a basic constraint most users never think about. When a chat conversation gets long enough, the system quietly drops earlier context to stay within its processing limits. It re-reads the entire conversation history every time you send a message, and at some point, begins discarding information it decides is no longer a priority. Your analyst doesn't forget the first half of the brief because it got too long. This is one example of an invisible system constraint that shapes output quality independent of model capability.

Models are genuinely improving. That is not the issue. What this paper is arguing is that the visible successes mask invisible assumptions that matter enormously in professional contexts. When fluent outputs and successful demos are treated as evidence that the hard problems are solved, attention shifts away from system design and methodology. Complexity does not disappear; it simply moves out of view.

Where things quietly break

To see why this matters in practice, consider what happens when an organization uses AI for competitive analysis — not as a one-off question, but as part of a structured process that informs strategy, budgets, and priorities.

The wrong-target problem. The entity confusion described above is not an edge case. Any company with an ambiguous name, a recent rebrand, or a limited digital footprint is vulnerable. The system's default behavior is to follow the richest data trail, which often means the most visible entity rather than the correct one. The output reads like a thorough analysis of the right question — about the wrong company. This failure mode is especially dangerous precisely because it is invisible in the final deliverable.

Shallow collection disguised as research. Even when the correct entity is identified, speed bias shows up in what happens next. A system runs a search, retrieves a press release, a generic "About Us" page, and a two-year-old blog post. It declares data collection complete and moves to synthesis. The output reads well, but anyone with domain experience would recognize that the signal is weak: the material is shallow, dated, and reveals almost nothing about how the company actually positions its products or frames its strategy. A human analyst would not stop there. They would explore the company's website in depth, use targeted searches to surface relevant pages, actively judge relevance, discard superficial content, and continue until coverage felt sufficient, not until a default result limit was exhausted.

Representative sampling matters. A chatbot might collect a handful of Reddit posts or social media comments and summarize them as if they reflect the broader market. The narrative may sound authoritative, but the evidence is fragile. Twelve posts cannot represent the range of sentiment in a market, and a different twelve posts selected on a different day might tell a different story.

The gap is not about effort or volume; increasing sample size is easy. The issue is whether the system understands that data gathering is an evaluative act requiring judgment about sufficiency, not simply executing a retrieval step.

What counts as “sufficient” is specific to your organization — your standards, your stakeholders’ expectations, and the decisions the analysis is meant to inform. A threshold suitable for an internal briefing is not the same as one required for a board-level recommendation. No general-purpose tool can know that standard unless someone encodes it into the process.

Non-determinism across runs. Given the same input, an LLM-driven workflow may produce outputs that vary significantly from one execution to the next. One run might emphasize messaging and tone; another might focus on product features; a third might omit entire dimensions of analysis. For casual exploration, this variability feels like creativity. For an organization paying for a consistent methodology — a way of answering, not just an answer — it is a liability. A report that changes structure, emphasis, or analytical framing from run to run undermines trust and makes downstream consumption harder, not easier.

Asymmetric comparisons. Now imagine the system is comparing four competitors. It collects website content, social media data, and analyst commentary for three of them. The fourth has no social media presence and minimal coverage. A simplistic workflow proceeds regardless: it generates whatever output it can and presents the result alongside the others. Three competitors analyzed in depth, one barely sketched — but the system does not flag this asymmetry or adjust. A human analyst would stop and ask: does this company still belong in the comparison set? Can the missing data be sourced elsewhere? Should a replacement be identified? If so, does the analysis need to be rerun from the start to preserve like-for-like comparability? This is not a minor procedural point. It is the moment where analytical integrity is either maintained or quietly abandoned.

These failure modes share a common trait: the output still looks good. The prose is fluent, the formatting is clean, the confidence is high. The errors are not in the language. They are in the invisible decisions that preceded it — what was collected, what was compared, what was assumed, and what was silently omitted.

Why structure is the answer

If the problem lives in the system, the solution does too. That solution is structure – and this is where most AI conversations go wrong.

The instinct many organizations have at this point is to improve their prompts: give the AI more detailed instructions, more context, a better brief. And this helps. More sophisticated prompting genuinely produces more consistent, higher-quality outputs. It is a meaningful step beyond naive usage.

But it does not solve the underlying problem. However well-crafted the prompt, the model is still probabilistic. The orchestration layer is still opaque. And the system still lacks the persistent state, evaluative judgment, and methodological discipline that professional analytical work requires.

Some teams go further and build their own automated workflows, chaining AI calls together in tools designed for that purpose. This represents real investment and real progress. But it also introduces a challenge that is easy to underestimate: building a workflow that works in a demo is not the same as building a system that works reliably in production. Edge cases, error handling, state management, and graceful degradation when data is missing are all software engineering problems. They require software engineering discipline. The fact that the building blocks are AI calls rather than traditional code does not make the surrounding system design any less critical. If anything, the non-determinism of the components makes it more so.

Organizations that extract durable value from AI-powered analysis impose structure at every layer. In practice, this means codifying the analytical methodologies that already exist implicitly in experienced teams. Over years of work, corporate teams, just like agencies and consultancies, develop standard approaches to competitive intelligence, market analysis, and strategic research. These approaches define not just what data to examine, but how to examine it: which questions to ask, how to structure outputs, how to communicate findings, and how to handle gaps and ambiguity.

Encoding these methodologies into software involves a mix of conventional programming and LLM-based components. Some steps are purely procedural – data collection, filtering, deduplication, categorization. Others rely on the model's ability to reason, summarize, or interpret. Crucially, the system constrains the model's role. It does not decide whether a competitive analysis includes pricing, positioning, or tone. Those decisions are already embedded in the playbook. The model operates within a defined scope, producing outputs in specific formats, using consistent language, addressing specific analytical dimensions of the problem.

This is not a reduction of intelligence. It reflects how human expertise actually works. Experienced analysts do not reinvent their methodology from scratch for every project. Their skill lies in selecting the right approach for a given context and executing it consistently — not in arbitrarily varying their process each time. Structured AI systems operate the same way: agency at the level of planning and routing, discipline at the level of execution.

And structure does not eliminate the model's capacity for novel insight. It creates conditions where such insights can be reliably captured, evaluated, and acted upon — rather than appearing and disappearing randomly across runs.

What good looks like

The difference between a system that generates text and one that supports real decisions comes down to three capabilities.

Traceability. When an end recipient challenges a claim (e.g.  "competitor messaging focuses on cost reduction") a well-designed system can show the evidence trail: how many data points were collected, how they were categorized, what representative examples look like within each category, and how the conclusion was derived. A weak system can only restate the claim in different words. This is the difference between output and decision support. The value is not just accountability; it is that humans can interrogate conclusions, challenge them, and build on them with confidence.

Regeneration from source, not surface editing. Professional contexts often require the same analysis to be reframed through a different lens, e.g. shifting from competitive positioning to technology evaluation, from strategic overview to investor briefing. A naive approach takes the final text and rewrites it, swapping vocabulary and adjusting emphasis. But conclusions that made sense in a positioning context may not hold in a technical one. A more rigorous approach propagates the reframing through the entire analytical pipeline, regenerating intermediate analyses where necessary and ensuring that the questions asked of the data change when the lens changes. The rewrite should be a genuine reanalysis, not a veneer over a misaligned foundation.

Consistent presentation as part of the analytical contract. Even when AI systems produce well-formatted outputs, consistency across reports and across time still matters. A monthly competitive update should feel like a continuation, not a reinvention. End recipients should not have to re-learn the structure of a deliverable each time they receive one.

In professional settings, presentation is part of the analytical contract. When structure, framing, and layout change from run to run, it creates friction for the people consuming the work. It also makes it difficult to make comparisons and track patterns over time. Treating presentation as a first-class component of the system — not something left to generative variability — reinforces the broader point: reliability emerges from disciplined structure, not from unbounded generation.

What this means for your organization

Most organizations are already using AI for research, analysis, or strategic input. The question is no longer whether AI is capable enough. The question is whether you understand the system shaping your outputs.

Every AI tool your team uses makes invisible decisions about how to search, what to retrieve, how much context to retain, and how to structure its response. These decisions constitute a methodology — one you did not choose, cannot inspect, and have no guarantee is aligned with how your organization thinks about the problem.

Most organizations build structured methodologies for research, innovation, and competitive analysis precisely to counteract bias, gut feel, and strategic inertia. These frameworks exist to enforce discipline in how evidence is gathered and interpreted. Introducing AI-generated outputs into these processes without holding them to the same standards creates a new vulnerability. An answer may sound well-reasoned yet be based on insufficient sampling, misidentified entities, or hidden methodological assumptions. When such outputs enter reports or decision-making forums unchallenged, they can quietly distort direction. The danger is not obvious error, rather it is a gradual misalignment.

For internal strategy teams, this plays out in planning conversations. When an AI-generated analysis is presented and looks thorough and well-sourced, no one in the room has reason to question it. Budgets shift. Priorities adjust. The cost rarely appears as dramatic failure. It appears as wasted effort, flawed prioritization, and opportunity cost that only becomes visible later.

For consultancies and agencies, the implications are sharper still. Your clients are paying for judgment — contextualized, structured, and defensible. That judgment is embedded in your process. If analysis is delegated to a tool that does not reflect your methodology, you are substituting your intellectual framework with a commodity layer. Over time, this erodes differentiation. If the underlying method is indistinguishable from what anyone else can access, the insight becomes indistinguishable too.

The organizations that extract durable value from AI will not be those chasing the most fluent outputs. They are those that design systems where structure, validation, and transparency are first-class concerns, where intelligence is not just generated, but disciplined.

And that leads to a deeper risk.

If you do not understand the method shaping an output, you are outsourcing judgment.

And in business, outsourced judgment always carries consequences.

*********

Co-Created is a venture studio that helps organizations design and build structured AI systems for research, competitive intelligence, and strategic analysis. We work with companies that have outgrown chatbot-level AI usage and need systems that reflect their own methodology, not someone else's.

AI tools generate polished market research reports in seconds -- but is it real? As generate tools flood inboxes and meetings with confident "findings," how can we verify them?
Stacey Seltzer
Stacey Seltzer
May 7, 2025
5 min read

When AI Gets It Wrong

In the Age of Instant Insights, the Real Competitive Advantage Is Knowing What to Trust

In my first job out of college, I worked as a trombonist in a rock band.  But when I finally made my parents happy and got a proper job later that year (rock and roll trombone is pretty niche, and I’m not that good) I worked in the economic research department at Brown Brothers Harriman. It was the kind of place where precision mattered—a lot. I would spend hours combing through capital flows data from Japan, eventually picking up the phone to call someone at the Japanese Ministry of Finance because I wasn’t sure I was interpreting their reporting conventions correctly. That’s what it took to get the data right.

Fast forward to today, and the idea of getting a polished market research report in seconds—courtesy of generative AI—feels miraculous. With tools like ChatGPT and OpenAI’s Deep Research, Deepseek and Claude’s expected upcoming release of a deep research model anyone can produce a sleek document filled with insights, charts, and stats. But here’s the question: in the age of AI, the research is fast—but is it real?

That’s not a rhetorical concern. As generative tools flood inboxes and decision-making meetings with confident-sounding “findings,” we’re entering a strange new era—one where everyone can create an “insights report,” but few can verify it. And the consequences for business, policy, and public trust are significant.

✦ The Mirage of AI-Generated Research

Benedict Evans recently documented his experience with OpenAI’s Deep Research. The tool generated a slick analysis of smartphone adoption in Japan—complete with citations. The only problem? The data was wrong. Key statistics were pulled from outdated or misinterpreted sources like Statista and Statcounter. How wrong? It doesn’t really matter because the end result was a report that looked authoritative but couldn’t be trusted.

This is more than a footnote in AI’s evolution. It’s a cautionary tale. Most large language models (LLMs), including ChatGPT, aren’t retrieval systems—they’re probabilistic engines. They generate the next likely word based on patterns in training data. That can mean they’re pulling from outdated or irrelevant data sources. Or worse, misinterpreting the data entirely failing to understand the nuance of what a dataset actually represents. Yet the results are presented in polished prose, with an air of confidence that makes errors nearly invisible. 

For consumers of information, this creates a strange asymmetry: the outputs feel credible, but the underlying logic is opaque. It’s a bit like getting stock advice from someone who sounds like Warren Buffett—until you realize they’re just guessing.

And here’s the real danger: unless you’re a subject matter expert, you won’t know what’s been misrepresented because you won’t even know what to question. The mistakes aren’t always obvious. They live in the assumptions, the framing, the fine print. If you don’t already understand the topic deeply, it’s easy to take the AI’s answer at face value—and that’s exactly when it’s most likely to mislead you. What you’re left with is research that sounds right, feels right, and might be right—but that you have no way of verifying without deep domain knowledge. That’s not just inefficient. It’s dangerous.

✦ Getting it Right

At Co-Created, we encountered this problem firsthand. We were using generative tools to speed up internal research, but we kept running into the same wall: we couldn’t trace anything. Outputs changed when we re-ran the same prompts. Citations disappeared. We couldn’t answer basic questions like, “Where did this data come from?” or “Why did the AI say this?”

The good news is that all the getting it wrong, led us to eventually get it right. Instead of chasing sleek one-off outputs, we wanted something that could reliably support business decisions.

A better solution is an AI-powered research tool designed for structure, traceability, and auditability.  Here’s how it can work differently:

• Deterministic Outputs, Not Just Free-Form Text

Sense builds repeatable workflows with structured prompts and data scaffolding. That means it’s not just hoping the AI gets it right—it’s designing for correctness.

• Smart Data Objects, Not Blobs of Text

Sense extracts key primitives—like a problem definition, a customer need, or a competitive insight—and tracks them individually. This enables chaining insights together over time, rather than getting isolated soundbites.

• Full Context Reconstruction

Instead of dropping raw documents into a prompt, the tool should reconstruct and organize relevant content across multiple sources, ensuring the AI model sees the full picture before responding.

• Audit Trails and Source Provenance

Every insight must link back to its origin—whether it’s a public filing, a competitor website, or a user-uploaded artifact. That makes verification easy, and hallucinations much less likely.

• Multi-Model Optimization

ChatGPT relies on one model. A good tool needs to use many—selecting different models for natural language processing, embeddings, or specialized analysis depending on the task.

• Custom Outputs Built for Business

From investor memos to quarterly reports to spreadsheet data dumps, the tool needs to deliver structured, exportable formats that match how teams actually work.

That’s why we built Sense. It’s not just another AI tool—it’s a research system built for teams that need to move fast and get it right. Because in a world where everyone can generate insights, the real edge is knowing which ones to trust.

✦ The Bigger Picture: What We Lose When We Trust Too Quickly

There’s a reason I remember that call to Japan’s Ministry of Finance. It wasn’t about one data point—it was about accountability. When you’re making decisions that affect people’s jobs, investments, or strategies, you need to know what’s real. And knowing means being able to trace back, challenge, and revise—not just consume and move on.

Generative AI isn’t going away. Nor should it. Tools like ChatGPT are invaluable for brainstorming, summarizing, and sparking ideas. But when it comes to research that informs action, businesses need to ask: What are we trusting, and why?

As the AI wave accelerates, the organizations that win won’t just be the ones who use it fastest. They’ll be the ones who build trust into the process—who can separate the insights worth acting on from the noise that just sounds good.

Reach out to start a conversation.

Too many companies are racing to define their “AI strategy,” as if it needs a dedicated lane. The smartest organizations are using AI to accelerate, enhance, and sharpen today's core strategies.
Daniel Shani
Daniel Shani
May 1, 2025
5 min read

Originally published by The AI Journal on April 26, 2025

Too many companies are racing to define their “AI strategy,” as if artificial intelligence is some new business function that needs a dedicated lane. But the real opportunity isn’t about what you can build for AI—it’s about what you can unlock with it. The smartest organizations aren’t rewriting their playbooks from scratch. They’re using AI to accelerate, enhance, and sharpen the strategies they already care about.

This isn’t about replacing fundamentals. It’s about getting more leverage on the things that already drive impact.

Here are four core pillars of business strategy that are being transformed—not replaced—by working with AI.

1. Keep a Live Pulse on the Market (and Make it Actionable)

Every company tries to track what’s happening around them—competitor moves, emerging customer needs, shifting cultural signals. The problem is, most of that happens sporadically, with a heavy reliance on manual analysis, anecdotal insight, or high-level macro indicators.

AI changes that. Today, intelligent systems can sift through thousands of unstructured sources—Reddit threads, local news, LinkedIn posts, investor decks, product reviews—and convert that chaos into structured, directional insights. You’re not just reading content or collecting data; you’re mapping the market in real time.

The added value? These insights aren’t buried in a quarterly report—they can be delivered to the right teams at the right time. Some companies are even building “living” models of their market environments–constantly refreshed, customized by audience, and embedded into everyday workflows. The outcome is a strategy that doesn’t just respond to change—it contextualizes and anticipates it.

2. Elevate the Value Proposition (Not Just the Toolset)

AI can certainly enhance tools and automate tasks. But its real power shows up when it prompts a deeper rethink of how you create and deliver value.

Take, for instance, a healthcare brand that initially set out to build a product recommendation chatbot—something smart and lightweight that could guide customers to the right supplement or service. As the project progressed, the team realized the same underlying personalization engine could support onboarding, behavior change, educational nudges, and even care team handoffs. The chatbot didn’t just improve customer support—it became a doorway to reimagining the entire experience.

That kind of pivot isn’t about chasing the next tool. It’s about looking at your business through a different lens: now that we can personalize at scale, how else might we create a deeper, better relationship with the people we serve?

3. Experimentation is King (and Now You Can Do It Smarter, Faster)

One of the most powerful shifts AI brings is speed—not just in output, but in learning. Traditional experimentation takes time. You come up with a new message or offer, build the assets, run a test, wait for results… and often learn too little, too late.

AI changes the rhythm. With synthetic data and intelligent agents, you can prototype narratives, simulate reactions across segments, and generate tailored campaigns at a pace that was unimaginable a year ago. (Personal note: I’m old-school in some ways—I still love hearing directly from real people out in the world. But AI can help with that too: transcribing interviews, summarizing themes, even surfacing sentiment you might have missed.)

This shift is already reshaping creative and go-to-market teams. We’re seeing the rise of “vibe marketing”—a parallel to the “vibe coding” movement that gave us platforms like Replit, Bolt, and Lovable. Just as one developer with the right tools can now build and ship a new product in hours, one marketer with the right AI stack can 100X their output– e.g. spin up landing pages, test angles, generate collateral across channels and automate end-to-end workflows with speed and precision.

Emerging tools like PhantomBuster, Jasper, and OpenChair are enabling highly specialized, niche automation for media testing, competitive tracking, and persona-driven messaging. The direction is clear: fast, lightweight, focused systems that do one thing really well. The agency of the future might be one smart person and a “room” full of purpose-built agents.

4. Execute Better, Faster (With Tools You Design)

Every organization wants to move faster and reduce friction. But it’s not just about automating more—it’s about customizing tools that work the way your teams do.

In some forward-leaning companies, teams are building internal libraries of GPT-style agents tuned to specific workflows—from customer service to product research to compliance. In one example, a growth-stage startup built over 100 internal agents, each supporting a specific business process. More importantly, the functional teams themselves drove the design—flagging tedious repetitive tasks, brainstorming better flows, iterating on what worked, and benefiting from the AI leverage.

The result? A culture of active optimization, where AI isn’t imposed top-down, but developed ground-up in service of the work that actually needs doing. Building smarter tooling became everyone’s job.

And the long-term effect? Less internal drag. Fewer handoffs. More time focused on creative and strategic thinking—the stuff humans are still uniquely good at.

Reality Check: You Still Have to Change (Just Not Everything)

Of course, working with AI doesn’t mean business as usual. Some shifts are non-negotiable:

  • Teams need to build new muscles—prompting, interpreting results, and course-correcting rapidly.

  • Strategy has to move from static planning to continuous, feedback-fed evolution.

  • Data systems must become more integrated, so insight and execution aren’t siloed.

  • Proprietary advantage will increasingly depend on how companies use, integrate, and learn from their own data. Closing the feedback loop—between what your AI outputs and what actually works—creates better results, better models, and better strategy.

In short: the fundamentals stay, but the game speeds up. The teams that win will be the ones that can adapt in-flight, not just in the offsite.

Conclusion: Build with AI, Not for AI

The companies getting ahead right now aren’t the ones spinning up isolated AI pilots or innovation labs off to the side. They’re the ones embedding AI into the heart of what they already do—understanding their market better, elevating the customer experience, iterating faster, and executing with less drag.

You don’t need an “AI strategy” that lives apart from the rest of your business. You need a strategy that uses AI to get sharper, faster, and more responsive. Don’t build something for AI. Build something better with it.

View all