Software Development 3.0 with AI - Exploring the New Era of Programming with Andrej Karpathy

Martin Swartz
Jun 20
8 min read

Updated: Jun 22

At University 365, we are committed to preparing our students, faculty, and community for the future of technology and innovation. One of the most transformative shifts in recent years has been the evolution of software development driven by artificial intelligence. This new era, which we call Software Development 3.0 with AI, was brilliantly explored by Andrej Karpathy, founding member of OpenAI and former director of AI at Tesla, during his keynote at Y Combinator's AI Startup School in San Francisco.

Karpathy’s insights provide a compelling framework for understanding how software is changing fundamentally again, driven by large language models and AI agents that enable programming in natural language like English.

As an institution dedicated to applied AI education, University 365 embraces this paradigm shift and integrates it into our pedagogy, curriculum, and lifelong learning philosophy. This article dives deep into Karpathy’s vision, connecting it to the mission of University three sixty-five to empower learners to become superhuman in the AI era.

Andrej Karpathy's keynote at AI Startup School in San Francisco. Slides provided by Andrej: https://drive.google.com/file/d/1a0h1mkwfmV2PlekxDN8isMrDA5evc4wW/view

From Software 1.0 to Software 3.0: The Evolution of Programming

Karpathy opens by reflecting on the history of software development and how it has evolved over the past seventy years. For most of this period, software was relatively stable in its fundamental nature: programmers wrote explicit instructions in code languages like C++ or Python, creating what he terms Software 1.0. This is the traditional paradigm where a developer writes the logic and rules that the computer follows.

Then came a major shift with the rise of neural networks and machine learning models, which Karpathy calls Software 2.0. Instead of explicitly coding rules, developers now tune datasets and train neural networks, whose parameters (weights) encode the solution to a problem. The program, in a sense, is not written line by line but learned through optimization. This transition introduced a new way of “programming” computers, but the models were still fixed-function: image classifiers, speech recognizers, and so on.

The newest and most profound transformation is the emergence of Software 3.0, where large language models (LLMs) become programmable computers themselves. Here, the “code” is your prompt, written in English, a natural language, that instructs the LLM to perform complex, dynamic tasks. This new programming paradigm is accessible to anyone who can express themselves in natural language, expanding the horizons of who can be a programmer and how software is created.

Reminder : Slides provided by Andrej: https://drive.google.com/file/d/1a0h1mkwfmV2PlekxDN8isMrDA5evc4wW/view

Programming in English: The Rise of Software 3.0

One of the most striking revelations Karpathy shares is that we are now programming computers in English. Unlike traditional programming languages, prompts to LLMs use natural language, which is inherently human and intuitive. This means that the barrier to software creation is lowered dramatically, making programming more inclusive and accessible.

For example, to perform sentiment classification, you could write Python code (Software 1.0), train a neural network on labeled data (Software 2.0), or simply prompt an LLM with a short, clear English instruction (Software 3.0). The implications are enormous: software development is no longer confined to specialists with years of coding experience but opens up to anyone who can communicate effectively in natural language.

Large Language Models as Utilities, Fabs, and Operating Systems

Karpathy draws a fascinating analogy comparing LLMs to utilities and semiconductor fabs, but ultimately argues that the most apt comparison is to operating systems. This analogy helps us understand the ecosystem of AI and software today.

LLMs as Utilities: Like electricity or water, LLMs are offered as metered services through APIs by companies such as OpenAI, Anthropic, and Google’s Gemini. These providers invest heavily in capital expenditures (capex) to build and train these models, akin to building a power grid. Users pay for access, expecting high uptime, low latency, and consistent quality.

LLMs as Fabs: The enormous cost and complexity of training state-of-the-art models resemble semiconductor fabrication plants, where advanced technology nodes and proprietary secrets are guarded. Some companies own their hardware and training infrastructure (like Google with TPUs), while others are fabless, relying on third-party GPUs.

LLMs as Operating Systems: Most interestingly, Karpathy sees LLMs as a new kind of operating system — a software ecosystem that orchestrates memory, compute, and complex interactions with tools and multimodal inputs. Just as Windows, Mac OS, and Linux shaped personal computing, LLM platforms are shaping the future of computing itself.

This analogy extends to how applications work across different LLMs, much like apps running on different operating systems. For instance, an LLM-powered app like Cursor can run on GPT, Anthropic, or Gemini, offering users a choice, similar to how VS Code runs on Windows, Linux, or Mac.

However, we are still in the early days, the “1960s era” of computing, where LLM compute is expensive and centralized in the cloud. The personal computing revolution for LLMs has yet to fully materialize, but experiments with running models on devices like Mac Minis hint at what might come.

The Psychology of Large Language Models: People Spirits with Cognitive Quirks

Karpathy offers a compelling metaphor for understanding LLMs: they are like “people spirits,” stochastic simulations of human language and thought. Trained on vast corpora of human text, these models develop an emergent psychology that is humanlike yet distinctly different.

Some key characteristics:

Encyclopedic Knowledge and Memory:
LLMs can recall vast amounts of information, far beyond any individual human’s capacity. Karpathy likens this to the autistic savant character in the movie Rain Man, who has near-perfect memory.
Cognitive Deficits and Hallucinations:
Despite their prowess, LLMs hallucinate, inventing facts or making mistakes no human would. For example, they might confuse numerical comparisons or spelling, showing jagged intelligence.
Limited Long-Term Memory:
Unlike humans who consolidate knowledge over time, LLMs have fixed weights and “working memory” limited by their context window. They suffer from a form of anterograde amnesia, forgetting context after each session, which complicates persistent learning.
Security Vulnerabilities:
LLMs are gullible and susceptible to prompt injections, data leaks, and other risks, requiring cautious design and supervision.

Understanding these traits is crucial to designing effective AI applications that leverage LLM strengths while mitigating their weaknesses.

Designing LLM Applications: Embracing Partial Autonomy

One of the most exciting opportunities Karpathy highlights is the design of partial autonomy apps. Instead of directly interacting with an LLM through a generic chat interface, specialized applications integrate LLM capabilities into traditional user interfaces, combining human control with AI assistance.

For example, in coding workflows, rather than copy-pasting code into a chat, apps like Cursor offer a familiar code editor augmented with LLM features. These apps manage context, orchestrate multiple LLM calls (such as embeddings, chat, and code diffs), and provide application-specific graphical user interfaces (GUIs) that make reviewing and accepting AI-generated changes fast and intuitive.

Karpathy stresses the importance of an “autonomy slider”, letting users control how much autonomy the AI has. You can request small code completions or let the AI make sweeping changes. This flexibility allows gradual trust-building and fine-tuned collaboration between human and AI.

Another example is Perplexity, an LLM-powered research tool that cites sources and lets users vary the depth of autonomy, from quick searches to deep research sessions. This trend towards partial autonomy is likely to reshape how software products and services evolve.

Human-AI Collaboration: The Generation-Verification Loop

Karpathy emphasizes that humans remain the bottleneck in AI-assisted workflows, especially when verifying AI outputs. To maximize productivity, the generation-verification loop must be sped up. GUIs play a critical role here, leveraging our natural ability to process visual information quickly and reducing the cognitive load of reading dense text.

At the same time, keeping AI “on a leash” is vital. Overreactive or overly autonomous agents can produce massive, unmanageable changes that require excessive human oversight. Best practices include crafting concrete prompts and working in small, incremental chunks to maintain control and ensure correctness.

Lessons from Tesla Autopilot: Autonomy is Hard, and Humans Matter

Drawing on his experience at Tesla, Karpathy shares lessons from building partially autonomous driving systems. The Tesla Autopilot software stack transitioned over time from traditional code (Software 1.0) to neural networks (Software 2.0) that “ate through” the codebase, improving capabilities while simplifying the stack.

Yet, even after a decade of development, full autonomy remains elusive, and human supervision is still necessary. This example cautions against overly optimistic timelines for AI agents and highlights the need for careful, incremental progress with human-in-the-loop designs.

The Iron Man Suit Analogy: Augmentation and Autonomy

Karpathy uses the Iron Man suit metaphor to illustrate the spectrum between AI augmentation and full agent autonomy. The suit can be both driven directly by Tony Stark (augmentation) and operate semi-autonomously as an agent.

Currently, with fallible LLMs, the emphasis should be on building Iron Man suits, augmentation tools with human oversight and partial autonomy, rather than fully autonomous agents. Custom GUIs and user experience design are essential to keep the human-AI interaction efficient and safe.

Vibe Coding: The Democratization of Programming

One of the most inspiring outcomes of Software Development 3.0 is the democratization of programming through what Karpathy calls vibe coding. This trend captures the feeling of casually building software by prompting LLMs in natural language, making programming accessible to people who may not know traditional languages.

Karpathy shares a heartwarming video of kids vibe coding, highlighting how this new paradigm is wholesome and full of promise for the future. He also recounts his own experience building simple apps in Swift via natural language prompts, without prior expertise, demonstrating the power of AI to lower barriers to software creation.

Challenges in Making AI-Generated Software Production Ready

While generating code with LLMs is fun and fast, the real challenge lies in making applications production-ready. Tasks like authentication, payment integration, deployment, and domain management involve complex manual steps outside of coding itself. Karpathy highlights how these “devops” tasks are still bottlenecks that require human intervention and careful orchestration.

Building for Agents: Preparing Digital Infrastructure for AI Consumers

Karpathy envisions a future where AI agents are first-class consumers and manipulators of digital information, alongside humans and traditional APIs. This calls for building software infrastructure that is agent-friendly.

Just as websites use files to instruct web crawlers, Karpathy proposes files-simple, markdown-based documents that provide LLMs with clear, machine-readable descriptions of a domain’s purpose and capabilities. This approach avoids error-prone HTML parsing and helps LLMs interact effectively with web content.

Some companies, like Vercel and Stripe, are already adapting their documentation to be LLM-friendly by providing markdown docs and replacing ambiguous instructions like “click here” with precise commands that an AI agent could execute.

Tools for LLM-Friendly Data Ingestion

Karpathy also highlights tools that transform conventional developer resources into LLM-consumable formats. For instance, changing a GitHub URL to a “get ingest” URL concatenates all repository files into a single text blob ready for LLM querying. More advanced tools like Deep Wiki analyze repos and generate rich documentation pages optimized for AI interaction.

These tools bridge the gap between human-centric software interfaces and AI-friendly data representations, accelerating the integration of LLMs into software development workflows.

Conclusion: Embracing the Software Development 3.0 Era at University 365

The era of Software Development 3.0 with AI is upon us, transforming programming from coded logic to data-tuned neural networks and now to natural language prompts that program large language models. This profound shift opens up unprecedented opportunities and challenges for developers, educators, and learners alike.

At University 365, we recognize the critical importance of staying at the forefront of this revolution. Our mission to make every learner superhuman aligns perfectly with the demands of this new software era, where fluency in multiple programming paradigms, human-AI collaboration, and agent-ready infrastructure will define success.

We are committed to equipping our students and faculty with the knowledge, skills, and mindset to thrive in this exciting, evolving landscape. Through neuroscience-based pedagogy, microlearning, AI coaching, and cutting-edge curricula, University 365 stands as a beacon of innovation and human-centric education in the AI age.

The future of software is still about writing code for sure, but it is also about partnering with AI, designing partial autonomy, and building systems that empower all of us to create, innovate, and flourish.

INSIDE - Publications