AI Is Fast, Your Development Isn't

The bottleneck in AI isn't the model, but the human expert. The critical metric is their iteration speed. A leader's role is to systematically remove friction from the development loop—from infrastructure to AI interaction—to maximize this velocity.

"We are working, the AI model is thinking" ...with apologies to Randall Munroe

The Bottleneck

The prevailing narrative about AI is one of exceptional speed and scale. But this focus on machine performance overlooks a key business reality: the primary bottleneck in creating valuable AI-centric products is, and will remain, the human with domain expertise.

Your best engineers, product builders, and data scientists are the ones who ask "Why?". They form the hypotheses, design the experiments, and critically, interpret the often ambiguous outputs of the AI. The machine can generate an answer in seconds, but the value is only unlocked when your expert can validate it, refine the prompt, and ask the next, better question.

Therefore, the most important metric for your AI development team is not lines of code or model size. It is the speed of iteration. How many cycles of hypothesis, execution, and validation can your team complete per hour?

This isn't just a software development issue; it's a business constraint. Your expert's cognitive capacity is the most precious resource in your "AI factory." Every moment they are blocked, waiting, or context-switching is a moment that your factory's most critical assembly line is idle. The cost of this idle time is not linear. It compounds, leading to lost ideas, broken flow states, and a sense of frustration that drains the creative energy from your most valuable talent.

The goal is to create an environment of cognitive fluidity, where the path from question to answer is as frictionless as possible. This requires a conscious and deliberate engineering effort, not just of the technology, but of the entire development ecosystem.

From Craftsmen to System Builders: Shifting the Team's Mindset

To effectively combat friction, a mental shift is required. Your team must evolve from being users of AI to being engineers of an AI-powered system. This means treating the entire development and feedback loop as a product in itself—one that is continuously optimised for speed.

The Architect's Role: Decouple and Isolate

A tightly-coupled system, where your application logic is deeply intertwined with live AI model calls, can lead to stagnation. Any developer working on the user interface or business logic becomes dependent on the availability and latency of the AI.

Consider aggressive decoupling. Build your architecture so that the AI component is a replaceable module. In development, this module can be stubbed with a mock that returns instantaneous, pre-defined responses. This allows most of the development work to proceed at the rapid pace of traditional software engineering, entirely independent of the AI model's performance.

Real-World Example: A team building a customer support bot should develop the entire conversation flow, user interface, and integration points using a mock service that returns canned answers. Only when the core application is robust and tested do they integrate the live LLM. This parallelisation of workstreams is a classic engineering strategy that can be useful in the AI context.

Enforce Conceptual Consistency

"The purpose of the architect is to create a design that a team of implementers can build." -Fred Brooks

A core element of a buildable design is conceptual consistency. This means the system's components share a coherent and predictable set of conventions. When a system lacks this integrity, developers are forced to constantly re-learn rules and manage exceptions, which is a significant source of cognitive friction. In an AI-centric system, this might manifest as different agents having incompatible interaction patterns, or data models that lack a unified structure. The architect’s role is to be the guardian of this consistency, ensuring the entire system feels like a coherent whole. This discipline directly reduces the mental overhead on the development team, allowing them to build faster and with greater confidence.

The Librarian's Role: Systematize Discovery

In many organisations, prompt engineering is treated as an individual craft. An engineer discovers a novel way to get the model to produce structured JSON or to adopt a specific persona, but that knowledge often remains in their personal notes. This is a significant waste.

High-velocity teams treat prompt patterns and agent interaction strategies as a shared asset.

Create a Prompt Library: Establish a centralised, version-controlled repository for effective prompts. This "cookbook" should be the first place a developer looks when tackling a new problem. Document not just the prompt itself, but the context, the "why" behind its structure, and the nature of the output it produces. While public repositories like Awesome ChatGPT Prompts are great, build your own too.
Share Interaction Transcripts: When a developer has a particularly successful or insightful interaction with an AI agent, that entire transcript is a learning artifact. Create a simple informal mechanism for sharing these with the rest of the team. This builds up a collective intelligence and accelerates everyone's learning.

Sources of Friction

In my work leading and advising technology teams, I've seen a recurring pattern. High-performing teams are not necessarily those with the biggest budgets, but those who are disciplined in identifying and eliminating friction from their development loop. In AI development, this friction manifests in several key areas, some familiar, some new to the AI world.

1. Infrastructure Friction: The Cost of Waiting

Every minute a developer waits for a cloud environment to provision, a dataset to load, or a container to build is a minute their cognitive momentum is lost. When the core task involves prompting an agent and waiting for a response, this "dead time" encourages distraction and shatters focus. The cost is not just wasted minutes; it is the disruption of deep work.

2. Model & Agent Friction: The Slow Oracle

Waiting for a large language model (LLM) to respond is the most obvious new bottleneck. This is compounded when the responses are unnecessarily verbose. An LLM that writes like a marketing intern when you need it to write like Hemingway is wasting your team's time. They must parse lengthy, low-density text to find the signal. This friction is a function of both latency and verbosity, and it directly impedes the rapid interaction required for problem-solving.

3. Process & Tooling Friction: The Cost of Getting Started

How many steps does it take for a developer to run a complete test of the system on their machine? If the answer is more than one, you have a process bottleneck. Clunky application development processes and complex startup scripts add a tax to every single iteration. When the cognitive overhead to simply begin a test is high, developers will naturally run fewer, larger, and less effective tests.

4. Cognitive & Cultural Friction: The Tax on Thinking

Beyond tangible delays, there lies a more subtle form of friction.

The Debugging Paradox: How do you debug a system that can give a different, yet plausible, answer every time you run it? The non-deterministic nature of many AI models introduces a significant cognitive load. Developers can't rely on traditional, repeatable debugging techniques. Instead, they must learn to interpret the model's behavior and coax it toward desired outcomes through prompt engineering. This ambiguity adds a tax on every iteration.
The Culture of Perfectionism: Traditional development often follows a linear path. AI development is inherently circular and experimental. A culture that penalises failed experiments or demands perfect specifications upfront will slow the velocity required for AI work. When failure is not an option, learning stops. Teams become hesitant, taking fewer risks and running fewer experiments.

Maximizing AI Development Velocity

As a technology leader or architect, your role is to engineer the system that allows your experts to work at the speed of their thoughts. This means treating the development feedback loop itself as a critical product you manage.

1. Engineer the Feedback Loop First

Consider Local-First Development: The goal is a near-zero-latency feedback loop. Empower developers with tools to run models locally (e.g., using Ollama with quantized models) for the bulk of their work. A developer who can run 100 local iterations in the time it takes to run five against a cloud API will solve the problem faster.
Consider Aggressive & Predictive Caching: Cache everything that can be cached—model responses for static queries, data transformations, and environment states. For LLM calls, if appropriate, improve cache hits by setting the API's temperature parameter to 0. This makes the model's output deterministic, ensuring that the same prompt always produces the same, cacheable response.
Consider Radical Simplicity in Setup: The entire system—application, agent, and dependencies—should be runnable with a single command. The "time-to-first-test" for a new engineer or a new feature branch should be measured in minutes, not hours.

2. Right-Size Your AI Interactions

Enforce the Hemingway Principle: Brevity is a feature. Engineer your prompts and system messages to favour concise, high-density responses. The goal is clarity and speed. This saves cognitive load and reduces token costs. Beyond prompt crafting, enforce this programmatically by setting a low max_tokens parameter in the API call, which acts as a hard ceiling on response length.
- Example: Instead of just summarize this document, a better system prompt might include ...respond in the style of Ernest Hemingway: direct, declarative sentences. Omit all pleasantries and introductory phrases. Use Markdown for structure.
Avoid the LLM Hammer: Not every problem requires a state-of-the-art LLM. I've seen teams burn cycles using a powerful generative model for tasks that a simple regular expression, a keyword search, or a smaller, specialized model could handle instantly and more reliably. Architectural discipline is crucial.
- This pattern is particularly apparent with AI coding assistants. While valuable for generating boilerplate, they can encourage developers to accept the first, often verbose, solution. A developer might prompt for complex logic and receive a 20-line function that works, failing to see that a more elegant 3-line solution using a standard library call exists. This introduces subtle technical debt. The goal is to foster a culture of critical engagement, where the AI is a junior pair-programmer whose suggestions require rigorous review.

3. Measure What Matters: The Iteration Cycle

Shift your team's focus from output-based metrics to process-based metrics. The key question is: "How can we increase the number of meaningful experiments we run per day?" Review or even track the "time to validation"—the total time from a code change to the developer seeing its impact. This single metric will often reveal hidden friction points in your system.

In the age of AI, your susta competitive advantage will likely come from your organization's ability to learn and adapt faster than the competition. That learning happens one iteration at a time.

By reducing friction and optimising your development environment for human velocity, you do more than just improve efficiency. You create a system that allows for curiosity, experimentation, and rapid discovery. You empower your most valuable resources—your domain experts—to build the future.

Ultimately, the role of a technology leader in the AI era is that of a systems thinker and a friction hunter. Your mandate is to look beyond the AI model itself and scrutinise the entire human-computer system. Your primary design goal is to enable your experts to have more high-quality thoughts per hour. By focusing on the speed and quality of this feedback loop, you create the conditions for sustainable progress.

If you are navigating the challenge of building high-velocity teams to harness the power of AI, I am open to a discussion. Feel free to book a time to chat or connect on LinkedIn.

Talk to me