A map of opportunities in the AI agent space

A preliminary thesis mapping out the problem space and investment opportunities related to AI agents across the technology stack.

Mar 18, 2024

Introduction

An AI agent is an autonomous software program that acts on behalf of users, perceives its environment (through sensors, reading data, processing text and images, or user input), takes actions (e.g. controlling robots, generating text, or making recommendations) based on its understanding of the environment and its training data, and tries to achieve specific goals (winning a game, optimising a process, accomplishing a task, or providing information). Some AI agents can learn and adapt based on new experiences and information, improving their performance over time. AI agents are used in personal assistants (Alexa, Google Assistant), chatbots (ChatGPT, Gemini), customer service, fraud detection, algorithmic trading, medical diagnosis, drug discovery, gaming, and content creation (images, video, and music), and as they become more sophisticated, they will likely be catalysts for the widespread use of AI in our lives.

The AI agent problem space

We're currently just scratching the surface of how we will use AI agents in our lives. If we believe that widespread use of AI agents will be a primary trend over the next decade, it’s useful to identify gaps and valuable problems across the AI agent technology stack for investment opportunities.

(1) Foundation Models

Foundation models are the basis for AI agent development. Examples include large language models (LLMs), computer vision models, and reinforcement learning models. Today, the competitive frontier focuses on improving the accuracy and performance of these models to enhance AI agents' capabilities. Despite fierce competition among major players (OpenAI, Google, Anthropic) and the need for significant resources to make performance breakthroughs, this space is still open for new model development and investment.

New models may differ in:

Architecture - From increasing the number of parameters, to adaptations of transformer-based architecture like “Mixture of Experts," using multiple "expert" sub-networks, each specialised in learning different parts of the input data, along with a gating network that combines their outputs, e.g. Mixtral 8×7B, to implementing new non-transformer-based architectures like Structured State Space Sequence Models (S4M), allowing efficient scaling and processing of longer sequences (and longer context windows), e.g. Mamba and StripedHyena.
Optimisation - Improving model performance (accuracy, speed, resource utilisation, scalability, robustness) through techniques like quantisation (reducing numerical precision for better performance at lower resource utilisation) and fine-tuning (further training a pre-trained model on a smaller, domain-specific dataset to adapt it to a specific task or domain, e.g. Smaug-72B).
Other aspects - Such as context window (Google’s Gemini 1.5 and models based on Hyena architecture enable million-token length context windows), multi-modality, and interpretability or explainability (important in healthcare and finance).
New conceptual approaches - Like Large Action Models, such as Rabbit OS’s LAM, which go beyond simply understanding and responding to language, and are designed to take actions in the real world based on what users tell them, by combining different model architectures and techniques to understand user intent, predict the necessary actions or steps to achieve that goal, and then execute those actions, such as booking an Uber, filling out a form online, or navigating through a software interface.

As models advance, competition focuses on new layers of improvements, optimisations and combinations of the above to find opportunities for lower marginal investment cost. The open source versus closed source debate also raises the question of whether open source model development leads to faster innovation than closed, with open source models catching up to closed source ones.

But as the space becomes more competitive, it’s possible that models may become commoditised, the value accrued at the model level becomes less significant, and more value accrues at the stage closest to the user, i.e. the agent stage.

(2) Infrastructure for building and deploying models

Aside from model architecture, two key ingredients for AI models are training data and compute resources.

High-quality and diverse training data is essential for training AI models, so infrastructure and services for collecting and preparing training data present investment opportunities. Selecting data to train the model, and rating and cleaning data to improve dataset quality are valuable problems to build products and services around. Larger models need to be trained on larger datasets, so solutions for data availability, accessibility (and perhaps ownership), collection, and increasing training efficiency with less data (e.g. pruning and optimising datasets1, or using synthetic data for fine-tuning) are needed.

Beyond the supply of training data, the next step is preparing data for model training. Here, tools for data labelling (especially for large scale datasets, which can be time-consuming and expensive), cleaning and scrubbing data for errors, analysis (profiling the dataset across various metrics), and curation (filtering data for the best examples for training) are becoming more important to enhance model performance.

Generally speaking, in the space of data and training, innovative techniques for model training and validation can open up new opportunities in foundation model development.

Training and operating models require significant compute resources. The computational power and infrastructure available for training and inference directly impact the speed and efficiency of AI agent development and deployment, so provision of compute resources to enable faster training, larger-scale experiments, and real-time inference and decision-making is another vector of opportunity in model development and deployment.

The need for training compute grows faster than Moore's law as model size increases. If the use of AI becomes widespread with billions of users making frequent queries daily, the demand for inference compute may even surpass training demand, leading to significant growth in GPU and cloud compute demand, maybe even hitting a wall of diminishing marginal returns at some point. Opportunities therefore exist in scaling AI compute resources through centralised or decentralised cloud compute infrastructure, or exploring new methods to improve model performance that rely less on compute growth.

Beyond data and compute, infrastructure and tools for model development and deployment offer a broader opportunity. These platforms make it easier to build, customise and run AI models by providing libraries, ready-made integrations, pre-trained models, training datasets, and tools for fine-tuning models (especially open source models) on your own data. Such platforms include Hugging Face (with the largest open-source collection of models, datasets, demos and metrics, for exploring, experimenting, collaborating, and building AI technology) and Together.ai (a cloud platform for building and running generative AI models). Also often overlooked are tools to better evaluate models in order to iterate and fine-tune them more effectively.

(3) Infrastructure and technical solutions for building AI agents

If AI agents are the “application layer” of AI, then an essential layer prior to that is the infrastructure and technical solutions for building AI agents. Here, the broad problems and pivotal areas to focus on for which generalised solutions are needed include:

What is the general architecture and what are the building blocks of an AI agent? What do they need to function autonomously and accomplish goals on our behalf, i.e. act as our agents? e.g. domain-specific knowledge bases, interfaces for interaction with human users and other AI agents, APIs for inputs, actions for outputs, etc.
How do AI agents communicate one-to-one, one-to-many, and many-to-many with humans and other AI agents? Do we need protocols for AI agents to communicate autonomously and coordinate with each other?
How do AI agents transact securely on our behalf? Do they need wallets for carrying out transactions, perhaps within a decentralised financial system? Or more generally, how do AI agents prove that they’ve taken particular actions on our behalf, and verify their identity and provenance, all of which will be critical for fostering trust and reliability in their operations?
How do we optimise and enhance agent performance?

For example, a development framework like LangChain, which provides tools to develop applications powered by LLMs (e.g. by making it easier to connect LLMs to external or proprietary data sources), productionise (e.g. by inspecting, testing and monitoring applications in order to constantly improve them) and deploy them, addresses some aspects of the first and fourth points, while a framework like CrewAI, for orchestrating role-playing autonomous AI agents, enabling agents to work together seamlessly to tackle complex tasks by providing the backbone for sophisticated multi-agent interactions, addresses the second point to some degree.

We’re just starting to see generalised solutions being developed in this space, and there’s still significant potential for advancement.

(4) AI agents

As industries mature, value capture tends to shift closer to the user, as we saw with the Internet (Internet applications vs Internet service providers) and mobile (mobile apps vs mobile networks) industries. AI agents are the closest layer to the user and therefore has the potential to capture the most value.

But as the biggest AI companies (OpenAI, Google, Anthropic, etc.) launch their own models and chatbots (which are themselves general AI agents powered by those models), are most independent AI agents just wrappers around a third-party LLM, easily replicated or made obsolete by the incumbent model owner? Is there a case for customised AI agents? What do custom AI agents need to be defensible?

While the latest general purpose agents built on the largest LLMs are indeed very capable, custom AI agents tailored to specific markets and use cases, trained on relevant datasets, and equipped with appropriate interfaces for their use cases can still present investment opportunities. These include:

Domain-specific AI agents - General LLM chatbots may not be able to provide specific answers to questions or tasks that require deep domain knowledge or expertise, for example, in fields like medicine or law. One approach is to take a pre-trained (closed or open source) model and further train it on a domain-specific dataset, and then provide access to this model through a custom agent. Investibility relies on the size, proprietary nature, and effectiveness of the training dataset, as well as any fine-tuning done to make the model more effective at the domain-specific task. In coding, AI agents like Github Copilot and CodeWhisperer (which lean more towards code autocompletion), Sudocode (a full-on coding agent) and Codegen (which automates mundane and repetitive software engineering tasks like codebase-wide migrations and refactoring by leveraging a multi-agent system for complex code generation) help users go from an idea expressed in natural language to programming, debugging and deployment. Even sub-domains within coding are fair game, e.g. Rosebud.ai (a nocode AI game maker) and v0.dev (which generates code specifically for user interfaces from a text prompt). Outside of coding, Scenario for generative AI for game asset creation, Luma AI’s Genie for creating 3D models, and Suno for music creation are just a handful of examples. Retrieval Augmented Generation (RAG) is another way to build domain-specific AI agents on top of general LLMs, applicable to enterprise AI agents that need access to proprietary datasets. In fact, private or proprietary verticalised datasets can unlock new use cases and AI agents to deliver them. For example, in health, a domain-specific agent could provide personalised medicine, in entertainment, it could generate shows on demand, in games, generate game worlds, game content and playable game designs on demand, and in e-commerce (combined with Just-In-Time manufacturing), you could create and buy anything you want.
AI agents with use case-specific user interfaces - Most AI agents today built on top of LLMs are chat-based. What other forms can AI agents take from a UI perspective? Here opportunities for innovation (and investment) lie in building the right user interface for the specific use case. For example, in AI agents for nocode website building, makereal.tldraw.com turns drawings into working software and websites2. Another example is the use of ComfyUI’s custom nodes (specifically the Painter node) to turn sketches into Stable Diffusion XL-generated images and game assets. Matrices offers an innovative twist to analytical research by enabling users to provide an empty or partially filled spreadsheet as the input, for the agent to do the research and fill it in on their behalf, while Sudowrite not only provides an appropriate interface but also an entire structured framework and process for writing fictional novels using AI. AI agents with domain specialisation and tailored user interfaces offer defensible investment opportunities that large incumbents building general AI models might not find as cost-efficient to explore and build.
Action-oriented AI agents - AI agents that can perform actions autonomously and on behalf of human users are the next frontier. OpenAI is developing AI agents that can complete complex tasks autonomously, like transferring data from a document to a spreadsheet, filling out expense reports, and entering them into accounting software by taking over a user’s device with permission (achieved by training on examples of humans using computers), and performing web-based tasks, such as gathering public data about companies, creating itineraries, and booking flight tickets. Another direction is agent-driven commerce, where AI agents assist consumers in discovering and evaluating purchasing options, and making unsupervised purchasing decisions. This is especially powerful if these agents can be personalised and embed user preferences and context. Software development is another domain where action-orientation is impactful. "Action” here entails writing code based on a product request or brief, something general AI models and specialised AI agents like Sudocode and Cognition Labs’ Devin are working on. But at a broader level, in spaces where real world actions can be undertaken digitally through APIs and real world fulfilment can be delivered through digitally connected logistics, AI agents could write and execute custom code to perform IRL (in real life) actions and fulfilment to achieve a specific goal. In this scenario, AI coding agents connected to the right APIs have become the general purpose action-oriented AI agents, but there’s still an argument for domain-specificity here because taking actions requires more specialised integrations and permissions, which a general agent may not have. That said, Big AI isn’t about to throw in the towel yet, with OpenAI enabling actions through plug-ins and potentially moving from an “ask anything” app to a “do everything” app.

When considering AI agents as investible propositions, the question is how to package the use case into a product that makes it easier for users to achieve that specific goal, compared to asking a general chatbot to do it. We should also look out for AI agents that unlock or enable new (human) behaviours, just like how the most successful web startups from the previous/current era (e.g. Airbnb, Uber, Tiktok, Instagram) unlocked new behaviours (in their time).

(5) Platforms for development and distribution of AI agents

Given how early we are in the industry’s recent development spurt, where we’re still competing on foundation models and just beginning to explore agent applications, it may seem premature to discuss platforms for development and distribution of AI agents. But OpenAI has already fired the first salvo with plugins that turn Chatgpt into a platform for developers and businesses to build on, their GPT Builder for the broader market, and the GPT Store3, and we’ve seen how this played out in the mobile apps industry, where platforms can capture significant value if they become the main channel for app discovery and distribution. Hugging Face assistants come hot on the heels of the GPT Store, offering an open source alternative to OpenAI’s platform without the need for a paid ChatGPT Plus subscription and the ability to choose between several open source LLMs to power your assistant. (Here’s a cooking assistant I whipped up in less than ten minutes for you to try.) This “store” of third-party assistants demonstrates how fast the open source community catches up to closed rivals in the space.

That said, I think the “distribution via store” model is ripe for disruption, so it’ll be interesting to see what new angles startups come up with to address this space, especially as we think about what an open platform for building and distributing AI agents could look like.

From an ecosystem of AI agents to an open web for AI

Opportunities in the AI agent space are vast and varied. From foundation models and infrastructure to AI agent development and platforms for their distribution, every stage of the tech stack offers room for innovation, investment, and growth. As AI agents become increasingly sophisticated and find application in various industries, I’m optimistic that we’ll uncover ever more opportunities in an almost fractal-like manner.

Perhaps we’ll even arrive at an ecosystem of AI agents operating in an open web that fosters specialisation and cooperation among different agents. And by leveraging open source and maybe even decentralised approaches, this ecosystem could encourage transparency, open standards, and interoperability, creating an environment where AI agents seamlessly communicate and work together, enhancing their individual capabilities and providing users with comprehensive and useful experiences. Developers and users could build, share, and access AI models and agents easily, unlocking a truly open AI landscape.

This envisioned open web for AI could promote innovation and address concerns around privacy, security, and monopolisation of AI technology. By fostering an ecosystem of AI agents on an open web, we might create a dynamic and thriving AI landscape that benefits individuals, businesses, and society as a whole.

If you liked this article and are reading it on the web or received it from a friend, please consider subscribing to my regular newsletter (so you’ll get articles like this delivered fresh to your inbox) by clicking the subscribe button below.

There’s evidence that pruned datasets of 30% size result in similar performance as the original.

More about them here.

While it’s arguable whether custom GPTs are merely wrappers around ChatGPT and are therefore too indefensible to capture much value, I wouldn’t underestimate how custom GPTs and the GPT store might evolve in time to come.

Research

Discussion about this post