Why Enterprise Needs Human-First AI

Augmenting AI drives success, not replacing jobs
Home  // . //  Insights //  Why Enterprise Needs Human-First AI

Three years after the release of ChatGPT, most companies are underdelivering on their AI ambitions — not because the technology is bad but because they are reasoning about it wrongly. Existing mental models have led many executives to think of large language models (LLMs) as synthetic humans that can replace workers wholesale. Others see them as fancy software tricks that cannot do anything meaningful. Both approaches are expensive mistakes.

The AI code smell problem — Why outputs pass the test but fail in real‑world use

The core problem is what software engineers call "code smell": experts’ sense for when source code harbors subtle problems that could undermine it. Much AI output passes this smell test. That is, the generated output looks right even to experienced professionals — but it then falls down on closer inspection.

This weakness can be seen in a variety of AI output. I used AI to generate hundreds of private equity research reports, and the sophisticated analysis and authoritative prose impressed seasoned investors. However, the target company's executives immediately spotted errors, such as an incorrect growth rate or a missing competitor. Northwestern University researchers generated fake AI abstracts, and scientists spotted they were fake in only 68% of cases. Of the real abstracts they examined, the reviewers also incorrectly identified 14% as being generated by AI. We've seen lawyers presenting briefs with hallucinated citations, consultants with erroneous claims, and AI labs publishing false citations in their own papers.

When AI outputs pass the smell test but trip up when used, they cause wild swings from executive overconfidence ("this will replace my analyst team") to dismissiveness ("this made mistakes, so it's useless").

What the coding agents teach us about AI reliability

To understand where enterprise AI is headed, look at coding agents. These tools handle between five and 30 minutes of autonomous work, equivalent to two to five hours of expert human effort. The catch? Software development program GitHub's own research found that while 90% of developers use the AI suggestions, only 30% get accepted. AI generation becomes most valuable precisely when its bad outputs can be easily discarded by a human watching closely and placed firmly in the driving seat.

Three principles for implementing AI that expands human capacity

Augmentation through AI, therefore, aligns with what LLMs do well: enhancing human judgment rather than replacing it. Companies framing AI success as headcount reduction will lose out. Effective use of AI requires rethinking implementation from first principles:

Use AI for work, not decisions

The IBM adage remains valid: "A computer can never be held accountable. Therefore, a computer must never make a management decision." LLMs are fundamentally inconsistent: brilliant at unexpected things, weak where they might be expected to be strong. This jagged frontier makes them terrible substitutes but powerful assistants. Winners design for this reality and have AI handle cognitive grunt work. Expert humans review, decide and remain responsible. AI expands what is possible.

Frame AI as capability expansion

Companies that frame AI as cost reduction will deploy defensive systems built by employees polishing their CVs. Those that frame it as capability expansion will ship creative tools built by people trying to win. Several large firms that announced significant layoffs citing AI have already quietly reversed course, re-hiring into those same teams. The reason? Clients demand humans. At least one large financial institution foresees an expansion in headcount to take advantage of the new opportunities from AI.

Build systems around LLMs for increasing autonomy

Human employees’ leverage when working closely with LLMs can be maximized by making the humans’ work context-accessible. If data and application programming interfaces are messy and incomprehensible to humans, they are also incomprehensible to AI agents trying to take a first-pass analysis. In many cases, intuitive tools unlock AI potential just as effectively as newer models.

Implementation makes or breaks foundation model applications

The early assumption was that building applications on foundation models would be commoditized as models improved. The opposite has happened. The performance gap between good and bad applications widened, because better models revealed how much success depends on structured context and integration into users' cognitive workflow. Give employees excellent models with poor implementation, and they will be frustrated. Give them a brilliant implementation of mediocre models, and they will hit capability ceilings. Give them both, and they'll sing.

Where we're headed — human-first AI collaboration drives growth not job cuts

Successful users are not replacing jobs. They are running multiple AI collaborations simultaneously. Productivity gains come from the redistribution of cognitive load, not like-for-like replacement. That means a gradual evolution, not revolutionary replacement. Humans will collaborate with well-designed assistants and tools, and the best firms will use the learnings to improve both.

Companies that champion human experience with AI will capture top-line growth by betting on their people. Those measuring success by headcount reduction will watch their best people leave for competitors who've figured out that "human-first" isn't just better marketing. It's a better strategy.