When AI Gets It Wrong

Stacey Seltzer
Stacey Seltzer
May 7, 2025
5-minute read
When AI Gets It Wrong

When AI Gets It Wrong

In the Age of Instant Insights, the Real Competitive Advantage Is Knowing What to Trust

In my first job out of college, I worked as a trombonist in a rock band.  But when I finally made my parents happy and got a proper job later that year (rock and roll trombone is pretty niche, and I’m not that good) I worked in the economic research department at Brown Brothers Harriman. It was the kind of place where precision mattered—a lot. I would spend hours combing through capital flows data from Japan, eventually picking up the phone to call someone at the Japanese Ministry of Finance because I wasn’t sure I was interpreting their reporting conventions correctly. That’s what it took to get the data right.

Fast forward to today, and the idea of getting a polished market research report in seconds—courtesy of generative AI—feels miraculous. With tools like ChatGPT and OpenAI’s Deep Research, Deepseek and Claude’s expected upcoming release of a deep research model anyone can produce a sleek document filled with insights, charts, and stats. But here’s the question: in the age of AI, the research is fast—but is it real?

That’s not a rhetorical concern. As generative tools flood inboxes and decision-making meetings with confident-sounding “findings,” we’re entering a strange new era—one where everyone can create an “insights report,” but few can verify it. And the consequences for business, policy, and public trust are significant.

✦ The Mirage of AI-Generated Research

Benedict Evans recently documented his experience with OpenAI’s Deep Research. The tool generated a slick analysis of smartphone adoption in Japan—complete with citations. The only problem? The data was wrong. Key statistics were pulled from outdated or misinterpreted sources like Statista and Statcounter. How wrong? It doesn’t really matter because the end result was a report that looked authoritative but couldn’t be trusted.

This is more than a footnote in AI’s evolution. It’s a cautionary tale. Most large language models (LLMs), including ChatGPT, aren’t retrieval systems—they’re probabilistic engines. They generate the next likely word based on patterns in training data. That can mean they’re pulling from outdated or irrelevant data sources. Or worse, misinterpreting the data entirely failing to understand the nuance of what a dataset actually represents. Yet the results are presented in polished prose, with an air of confidence that makes errors nearly invisible. 

For consumers of information, this creates a strange asymmetry: the outputs feel credible, but the underlying logic is opaque. It’s a bit like getting stock advice from someone who sounds like Warren Buffett—until you realize they’re just guessing.

And here’s the real danger: unless you’re a subject matter expert, you won’t know what’s been misrepresented because you won’t even know what to question. The mistakes aren’t always obvious. They live in the assumptions, the framing, the fine print. If you don’t already understand the topic deeply, it’s easy to take the AI’s answer at face value—and that’s exactly when it’s most likely to mislead you. What you’re left with is research that sounds right, feels right, and might be right—but that you have no way of verifying without deep domain knowledge. That’s not just inefficient. It’s dangerous.

✦ Getting it Right

At Co-Created, we encountered this problem firsthand. We were using generative tools to speed up internal research, but we kept running into the same wall: we couldn’t trace anything. Outputs changed when we re-ran the same prompts. Citations disappeared. We couldn’t answer basic questions like, “Where did this data come from?” or “Why did the AI say this?”

The good news is that all the getting it wrong, led us to eventually get it right. Instead of chasing sleek one-off outputs, we wanted something that could reliably support business decisions.

A better solution is an AI-powered research tool designed for structure, traceability, and auditability.  Here’s how it can work differently:

• Deterministic Outputs, Not Just Free-Form Text

Sense builds repeatable workflows with structured prompts and data scaffolding. That means it’s not just hoping the AI gets it right—it’s designing for correctness.

• Smart Data Objects, Not Blobs of Text

Sense extracts key primitives—like a problem definition, a customer need, or a competitive insight—and tracks them individually. This enables chaining insights together over time, rather than getting isolated soundbites.

• Full Context Reconstruction

Instead of dropping raw documents into a prompt, the tool should reconstruct and organize relevant content across multiple sources, ensuring the AI model sees the full picture before responding.

• Audit Trails and Source Provenance

Every insight must link back to its origin—whether it’s a public filing, a competitor website, or a user-uploaded artifact. That makes verification easy, and hallucinations much less likely.

• Multi-Model Optimization

ChatGPT relies on one model. A good tool needs to use many—selecting different models for natural language processing, embeddings, or specialized analysis depending on the task.

• Custom Outputs Built for Business

From investor memos to quarterly reports to spreadsheet data dumps, the tool needs to deliver structured, exportable formats that match how teams actually work.

That’s why we built Sense. It’s not just another AI tool—it’s a research system built for teams that need to move fast and get it right. Because in a world where everyone can generate insights, the real edge is knowing which ones to trust.

✦ The Bigger Picture: What We Lose When We Trust Too Quickly

There’s a reason I remember that call to Japan’s Ministry of Finance. It wasn’t about one data point—it was about accountability. When you’re making decisions that affect people’s jobs, investments, or strategies, you need to know what’s real. And knowing means being able to trace back, challenge, and revise—not just consume and move on.

Generative AI isn’t going away. Nor should it. Tools like ChatGPT are invaluable for brainstorming, summarizing, and sparking ideas. But when it comes to research that informs action, businesses need to ask: What are we trusting, and why?

As the AI wave accelerates, the organizations that win won’t just be the ones who use it fastest. They’ll be the ones who build trust into the process—who can separate the insights worth acting on from the noise that just sounds good.

Reach out to start a conversation.

Share this post
Stacey Seltzer
Stacey Seltzer
May 7, 2025
5 min read

Sign up to our newsletter

Stay up to date on our latest updates, insights, and musings.

Insights

More from Category

Read more from the Co-Created team below.

AI tools generate polished market research reports in seconds -- but is it real? As generate tools flood inboxes and meetings with confident "findings," how can we verify them?
Stacey Seltzer
Stacey Seltzer
May 7, 2025
5 min read

When AI Gets It Wrong

In the Age of Instant Insights, the Real Competitive Advantage Is Knowing What to Trust

In my first job out of college, I worked as a trombonist in a rock band.  But when I finally made my parents happy and got a proper job later that year (rock and roll trombone is pretty niche, and I’m not that good) I worked in the economic research department at Brown Brothers Harriman. It was the kind of place where precision mattered—a lot. I would spend hours combing through capital flows data from Japan, eventually picking up the phone to call someone at the Japanese Ministry of Finance because I wasn’t sure I was interpreting their reporting conventions correctly. That’s what it took to get the data right.

Fast forward to today, and the idea of getting a polished market research report in seconds—courtesy of generative AI—feels miraculous. With tools like ChatGPT and OpenAI’s Deep Research, Deepseek and Claude’s expected upcoming release of a deep research model anyone can produce a sleek document filled with insights, charts, and stats. But here’s the question: in the age of AI, the research is fast—but is it real?

That’s not a rhetorical concern. As generative tools flood inboxes and decision-making meetings with confident-sounding “findings,” we’re entering a strange new era—one where everyone can create an “insights report,” but few can verify it. And the consequences for business, policy, and public trust are significant.

✦ The Mirage of AI-Generated Research

Benedict Evans recently documented his experience with OpenAI’s Deep Research. The tool generated a slick analysis of smartphone adoption in Japan—complete with citations. The only problem? The data was wrong. Key statistics were pulled from outdated or misinterpreted sources like Statista and Statcounter. How wrong? It doesn’t really matter because the end result was a report that looked authoritative but couldn’t be trusted.

This is more than a footnote in AI’s evolution. It’s a cautionary tale. Most large language models (LLMs), including ChatGPT, aren’t retrieval systems—they’re probabilistic engines. They generate the next likely word based on patterns in training data. That can mean they’re pulling from outdated or irrelevant data sources. Or worse, misinterpreting the data entirely failing to understand the nuance of what a dataset actually represents. Yet the results are presented in polished prose, with an air of confidence that makes errors nearly invisible. 

For consumers of information, this creates a strange asymmetry: the outputs feel credible, but the underlying logic is opaque. It’s a bit like getting stock advice from someone who sounds like Warren Buffett—until you realize they’re just guessing.

And here’s the real danger: unless you’re a subject matter expert, you won’t know what’s been misrepresented because you won’t even know what to question. The mistakes aren’t always obvious. They live in the assumptions, the framing, the fine print. If you don’t already understand the topic deeply, it’s easy to take the AI’s answer at face value—and that’s exactly when it’s most likely to mislead you. What you’re left with is research that sounds right, feels right, and might be right—but that you have no way of verifying without deep domain knowledge. That’s not just inefficient. It’s dangerous.

✦ Getting it Right

At Co-Created, we encountered this problem firsthand. We were using generative tools to speed up internal research, but we kept running into the same wall: we couldn’t trace anything. Outputs changed when we re-ran the same prompts. Citations disappeared. We couldn’t answer basic questions like, “Where did this data come from?” or “Why did the AI say this?”

The good news is that all the getting it wrong, led us to eventually get it right. Instead of chasing sleek one-off outputs, we wanted something that could reliably support business decisions.

A better solution is an AI-powered research tool designed for structure, traceability, and auditability.  Here’s how it can work differently:

• Deterministic Outputs, Not Just Free-Form Text

Sense builds repeatable workflows with structured prompts and data scaffolding. That means it’s not just hoping the AI gets it right—it’s designing for correctness.

• Smart Data Objects, Not Blobs of Text

Sense extracts key primitives—like a problem definition, a customer need, or a competitive insight—and tracks them individually. This enables chaining insights together over time, rather than getting isolated soundbites.

• Full Context Reconstruction

Instead of dropping raw documents into a prompt, the tool should reconstruct and organize relevant content across multiple sources, ensuring the AI model sees the full picture before responding.

• Audit Trails and Source Provenance

Every insight must link back to its origin—whether it’s a public filing, a competitor website, or a user-uploaded artifact. That makes verification easy, and hallucinations much less likely.

• Multi-Model Optimization

ChatGPT relies on one model. A good tool needs to use many—selecting different models for natural language processing, embeddings, or specialized analysis depending on the task.

• Custom Outputs Built for Business

From investor memos to quarterly reports to spreadsheet data dumps, the tool needs to deliver structured, exportable formats that match how teams actually work.

That’s why we built Sense. It’s not just another AI tool—it’s a research system built for teams that need to move fast and get it right. Because in a world where everyone can generate insights, the real edge is knowing which ones to trust.

✦ The Bigger Picture: What We Lose When We Trust Too Quickly

There’s a reason I remember that call to Japan’s Ministry of Finance. It wasn’t about one data point—it was about accountability. When you’re making decisions that affect people’s jobs, investments, or strategies, you need to know what’s real. And knowing means being able to trace back, challenge, and revise—not just consume and move on.

Generative AI isn’t going away. Nor should it. Tools like ChatGPT are invaluable for brainstorming, summarizing, and sparking ideas. But when it comes to research that informs action, businesses need to ask: What are we trusting, and why?

As the AI wave accelerates, the organizations that win won’t just be the ones who use it fastest. They’ll be the ones who build trust into the process—who can separate the insights worth acting on from the noise that just sounds good.

Reach out to start a conversation.

Too many companies are racing to define their “AI strategy,” as if it needs a dedicated lane. The smartest organizations are using AI to accelerate, enhance, and sharpen today's core strategies.
Daniel Shani
Daniel Shani
May 1, 2025
5 min read

Originally published by The AI Journal on April 26, 2025

Too many companies are racing to define their “AI strategy,” as if artificial intelligence is some new business function that needs a dedicated lane. But the real opportunity isn’t about what you can build for AI—it’s about what you can unlock with it. The smartest organizations aren’t rewriting their playbooks from scratch. They’re using AI to accelerate, enhance, and sharpen the strategies they already care about.

This isn’t about replacing fundamentals. It’s about getting more leverage on the things that already drive impact.

Here are four core pillars of business strategy that are being transformed—not replaced—by working with AI.

1. Keep a Live Pulse on the Market (and Make it Actionable)

Every company tries to track what’s happening around them—competitor moves, emerging customer needs, shifting cultural signals. The problem is, most of that happens sporadically, with a heavy reliance on manual analysis, anecdotal insight, or high-level macro indicators.

AI changes that. Today, intelligent systems can sift through thousands of unstructured sources—Reddit threads, local news, LinkedIn posts, investor decks, product reviews—and convert that chaos into structured, directional insights. You’re not just reading content or collecting data; you’re mapping the market in real time.

The added value? These insights aren’t buried in a quarterly report—they can be delivered to the right teams at the right time. Some companies are even building “living” models of their market environments–constantly refreshed, customized by audience, and embedded into everyday workflows. The outcome is a strategy that doesn’t just respond to change—it contextualizes and anticipates it.

2. Elevate the Value Proposition (Not Just the Toolset)

AI can certainly enhance tools and automate tasks. But its real power shows up when it prompts a deeper rethink of how you create and deliver value.

Take, for instance, a healthcare brand that initially set out to build a product recommendation chatbot—something smart and lightweight that could guide customers to the right supplement or service. As the project progressed, the team realized the same underlying personalization engine could support onboarding, behavior change, educational nudges, and even care team handoffs. The chatbot didn’t just improve customer support—it became a doorway to reimagining the entire experience.

That kind of pivot isn’t about chasing the next tool. It’s about looking at your business through a different lens: now that we can personalize at scale, how else might we create a deeper, better relationship with the people we serve?

3. Experimentation is King (and Now You Can Do It Smarter, Faster)

One of the most powerful shifts AI brings is speed—not just in output, but in learning. Traditional experimentation takes time. You come up with a new message or offer, build the assets, run a test, wait for results… and often learn too little, too late.

AI changes the rhythm. With synthetic data and intelligent agents, you can prototype narratives, simulate reactions across segments, and generate tailored campaigns at a pace that was unimaginable a year ago. (Personal note: I’m old-school in some ways—I still love hearing directly from real people out in the world. But AI can help with that too: transcribing interviews, summarizing themes, even surfacing sentiment you might have missed.)

This shift is already reshaping creative and go-to-market teams. We’re seeing the rise of “vibe marketing”—a parallel to the “vibe coding” movement that gave us platforms like Replit, Bolt, and Lovable. Just as one developer with the right tools can now build and ship a new product in hours, one marketer with the right AI stack can 100X their output– e.g. spin up landing pages, test angles, generate collateral across channels and automate end-to-end workflows with speed and precision.

Emerging tools like PhantomBuster, Jasper, and OpenChair are enabling highly specialized, niche automation for media testing, competitive tracking, and persona-driven messaging. The direction is clear: fast, lightweight, focused systems that do one thing really well. The agency of the future might be one smart person and a “room” full of purpose-built agents.

4. Execute Better, Faster (With Tools You Design)

Every organization wants to move faster and reduce friction. But it’s not just about automating more—it’s about customizing tools that work the way your teams do.

In some forward-leaning companies, teams are building internal libraries of GPT-style agents tuned to specific workflows—from customer service to product research to compliance. In one example, a growth-stage startup built over 100 internal agents, each supporting a specific business process. More importantly, the functional teams themselves drove the design—flagging tedious repetitive tasks, brainstorming better flows, iterating on what worked, and benefiting from the AI leverage.

The result? A culture of active optimization, where AI isn’t imposed top-down, but developed ground-up in service of the work that actually needs doing. Building smarter tooling became everyone’s job.

And the long-term effect? Less internal drag. Fewer handoffs. More time focused on creative and strategic thinking—the stuff humans are still uniquely good at.

Reality Check: You Still Have to Change (Just Not Everything)

Of course, working with AI doesn’t mean business as usual. Some shifts are non-negotiable:

  • Teams need to build new muscles—prompting, interpreting results, and course-correcting rapidly.

  • Strategy has to move from static planning to continuous, feedback-fed evolution.

  • Data systems must become more integrated, so insight and execution aren’t siloed.

  • Proprietary advantage will increasingly depend on how companies use, integrate, and learn from their own data. Closing the feedback loop—between what your AI outputs and what actually works—creates better results, better models, and better strategy.

In short: the fundamentals stay, but the game speeds up. The teams that win will be the ones that can adapt in-flight, not just in the offsite.

Conclusion: Build with AI, Not for AI

The companies getting ahead right now aren’t the ones spinning up isolated AI pilots or innovation labs off to the side. They’re the ones embedding AI into the heart of what they already do—understanding their market better, elevating the customer experience, iterating faster, and executing with less drag.

You don’t need an “AI strategy” that lives apart from the rest of your business. You need a strategy that uses AI to get sharper, faster, and more responsive. Don’t build something for AI. Build something better with it.

The discourse around AI in education often lurches between panic and hype; will it replace teachers, is it the end of thinking? The question isn’t whether we should be using AI but how to use it well?
Stacey Seltzer
Stacey Seltzer
April 28, 2025
5 min read

Originally published by The AI Journal on April 24, 2025

The discourse around AI in education often lurches between panic and hype; will it replace teachers, is it the end of thinking, is it a revolution? But in classrooms like ours — at Hudson Lab School, a project-based K–8 program just outside New York City — the conversation is less dramatic and more iterative. The question isn’t whether we should be using AI. The question is: how do we use it well?

At HLS, we’ve spent the last year treating AI not as a separate curriculum or policy mandate, but as a tool integrated into the daily work of learning. We’ve tested it in capstone projects, used it to support differentiated instruction, and introduced it in teacher workflows. We’ve experimented with a range of tools — ChatGPT, NotebookLM, Runway, Inkwire — and collaborated with entrepreneurs from our studio, Co-Created, to bring emerging AI applications into the school environment.

This article is a field report of sorts: a look at what’s actually working, where the challenges lie, and what we’re learning about the practical role AI can play in a real school.

Prompting as the New Literacy

In 2024, AI is already in the hands of students. A recent international survey showed that 86% of students report using AI tools in their academic work, with nearly a quarter engaging daily. Among American high schoolers, the numbers are even more striking — especially in writing-heavy disciplines like English and social studies. 

Yet, the level of fluency with these tools varies widely. Most students know how to ask a chatbot for help. Far fewer know how to interrogate its answers, challenge its assumptions, or build a productive back-and-forth.

At HLS, we’re treating prompting as a new core literacy — a set of metacognitive practices that help students engage with generative systems effectively and responsibly. We’re not teaching (and definitely not allowing) students to use AI to write for them. We’re teaching them to use it to learn with them.

We begin by introducing basic prompt structures in low-stakes contexts — not to produce polished work, but to explore ideas. Students might ask ChatGPT to act as a thought partner while planning their writing assignments, to generate study questions and flashcards based on their own notes, or to offer alternative perspectives on a historical event. Some use it to role-play different scenarios or help them look at multiple sides of an issue. Others prompt it to quiz them on concepts they’ve been struggling with, or to explain the same topic in multiple ways. 

Because each student is engaging the model individually, with prompts tailored to their needs, the interaction becomes deeply personalized — a kind of one-on-one tutorial that adjusts in real time to the learner’s questions, interests, and level of understanding. These early interactions aren’t about getting the “right” answer. They’re about developing the habit of thinking with a tool that responds. Over time, students stop seeing AI as a vending machine and start treating it as a dynamic, imperfect collaborator — one that helps them test ideas, surface blind spots, and stretch their thinking.

When these skills are integrated into real projects, the results are both creative and rigorous.

Student Projects: AI as Amplifier and Provocation

Take, for example, the project one of our eighth graders developed as part of their capstone — actually building a working beta version of a service called “back in my day” which allowed people to converse with individuals from their family tree. The idea emerged from a convergence of personal interests: genealogy, digital memory, and the fact that our school is co-located with a senior living facility. The student wondered: Could you build a system that allowed people to “talk to” deceased relatives by simulating their personalities, speech patterns, and stories? 

He started with family documents and oral histories, then used a combination of tools — including ChatGPT for linguistic modeling, ElevenLabs for voice generation, and a custom prompt scaffold we co-developed — to create a beta version of a persona-simulating chatbot. What started as a technical experiment quickly turned into an ethical inquiry: Should we do this? What does it mean to simulate someone’s voice, or story, or opinions? He also explored what kinds of implications this could have for grieving people and would it be positive or negative for the users. 

This wasn’t a sidebar project. It became a capstone: deeply personal, technically sophisticated, and intellectually provocative. And AI was at the center of it — not doing the thinking, but provoking more of it.

In another example, a sixth grader exploring the U.S. Constitution asked whether AI itself might gain personhood by 2075. Her culminating project was a simulated presidential election featuring AI candidates, designed and animated using Runway. She created original scripts, recorded performance footage, and prompted the tool to render campaign videos. What could have been a speculative gimmick became a lens for discussing democratic values, personhood, and rights — all refracted through the emerging reality of AI’s social presence.

These aren’t hypotheticals or case studies from a lab. These are middle schoolers using real tools to ask real questions about their world — and the one they’re inheriting.

Teachers as AI Practitioners

The shift we’ve seen in our teaching staff over the past year has been just as important — and in some ways more surprising — than the changes among students. When we first introduced generative AI in professional development sessions, the response was cautious. Some teachers saw the tools as gimmicks. Others viewed them as a threat to their professional identity and many simply didn’t see how they could be relevant to their day-to-day work.

That changed when we moved from theory to practice. As soon as teachers were given space to experiment — with support, without pressure — attitudes began to shift. They started using generative AI tools not to replace their planning, but to extend it; one teacher used ChatGPT to create differentiated reading materials from a single anchor text, adjusting the prompt to produce versions for different reading levels. Others began using it as a thought partner — brainstorming project ideas, writing prompts, rubrics, and alternate ways to explain tricky concepts. The emphasis wasn’t on perfection, it was on getting started.

NotebookLM received a lot of early attention. Teachers uploaded their weekly notes and used the tool to generate podcast-style audio summaries to accompany classroom newsletters. It was a small experiment, but an impactful one — parents reported actually listening, and it helped deepen the sense of connection between home and school.

We’ve also started piloting Goblins, an AI math tutor developed by an entrepreneur in the Co-Created network, to explore how AI might support individualized instruction in more structured subjects. It’s still early, but we’re already seeing promising signs of how targeted practice and real-time feedback can supplement classroom instruction.  Particularly interesting is how adaptive AI can be to students' different learning approaches and needs, allowing teachers to be more differentiated and personal in their approaches to teaching.  

And then there are the quiet surprises. I remember logging into an administrative view on one platform and seeing dozens of lesson plans that had been built out — not because we had mandated anything, but because teachers had simply started using the tools. They weren’t announcing it. They were just doing the work.

Platforms like Inkwire, which support the design of interdisciplinary, project-based units, have also made a noticeable impact. Teachers report spending less time searching for ideas and more time adapting and refining them — because the foundational materials are already generated. The result isn’t generic AI-driven curriculum. It’s curriculum that reflects the creativity of the teacher, accelerated by the scaffolding these tools provide.

What’s made the biggest difference, though, is targeted support on prompting. Not “how to use AI,” but how to ask better questions. How to engage in a productive dialogue. How to refine and reframe. In our sessions, we treat prompts not as commands, but as design tools — ways to push the model, and the teacher’s own thinking, into new territory. When used that way, generative AI becomes not just a productivity booster, but a source of professional inspiration.

What We’re Learning

The value of AI in the classroom, as we’re seeing it, is not about automation or efficiency. It’s about acceleration — of thought, of design, of iteration. When used well, AI tools help both students and teachers move more quickly from idea to prototype, from question to debate, from concept to execution. And, crucially, they help surface new questions that wouldn’t otherwise be asked.

But it only works when the culture supports it. At HLS, we’re fortunate to have a school structure — interdisciplinary, project-based, agile — that allows us to experiment in real time. We also benefit from our work at Co-Created, where we collaborate with entrepreneurs building the next wave of AI-powered tools for learners and educators. That cross-pollination is essential: it keeps our thinking fresh, and it ensures that our practice is informed by the frontier, not just tradition.

Final Thoughts

AI in schools is not a yes/no question. It’s a how/why/when set of questions — and the answers will vary. What’s clear from our experience is that meaningful integration doesn’t start with policy. It starts with practice. With students experimenting, with teachers testing, with school leaders asking, week by week: What worked? What didn’t? What’s next?

That’s how we’re approaching AI at Hudson Lab School — and so far, what’s actually working isn’t the tool itself. It’s the mindset that surrounds it, especially as we know that the AI we’re using today is the worst AI we’ll ever use. 

View all