Commercial Opportunities for Venture Capital with Artificial Intelligence

NEWS

June 1 2024
10 minutes
Words: Charles Schrager, Mike Townend

April 2024 Update – Commercial opportunities for venture capital in Artificial Intelligence

 

The Isomer Capital Accelerant Fund Investment Thesis

Executive Summary

The advances in AI research are such that we are very close to a tipping point, where model outputs will be consistently useful in creating value. This will make investing in AI a priority for corporates and investors alike. It is likely that parts of the infrastructure/application layer will see 100x plus returns on capital. In our view, this is not as fantastic, in the truest sense of the word, as it seems.  Especially when one considers returns from M&A that followed Web2 full-scale adoption in the early 2000’s.

1.0 The Market Opportunity: $2T

Our original aggregate-based forecast was that the TAM for AI/ML would be close to $90bn by 2026. We believe that most corporates will see the need to reallocate capex spending to this area, given the potential cost to serve advantage, value to be generated from dynamic decision making, revenue improvements, and ironically, fear of being left behind. European companies’ annual capex spend is about $280bn per year[1], and we believe that it would not be unreasonable for between 5% and 10% of capex (note this would only be 0.25-0.5% of sales) to be deployed in this area which easily validates the $90bn. Spend today is a fraction of this amount, as very few companies had a budget for this AI at the start of 2023.

Once in production, post the experimental stage, we think the outlook is that spending on AI/ML could reach 2.5% of sales, this seems neither a stretch nor does it imply a 1 for 1 substitution of capex (currently around 5.5% of sales) as substitution for labour cost will form an important part of this, and labour cost to sales averages around 70% of sales[2].

The value for newcomers (20% of the value capture) would appear, using a 10x EV/sales ARR measure, to increase from $200bn in 2026, to as much as $2 Tr a short time later.

[1] Our thanks to UBS HOLT for the data

[2] US Bureau of Labor statistics

2.0 Investment selection : Positioning for success today

2.1 Is it possible to beat the Large Language models (LLM’s)?

 

We think that deploying conventional LLM architecture in pursuit of competitive advantage is probably unlikely to be successful. There is a consistent and widening gap between the largest models and the rest of the pack, in terms of perplexity (accuracy). LLMsys, (one of the better performance benchmarking tools),  has moved to evaluating models using chat bot benchmarks rather than standardised tests, as this is inevitably the most likely iteration of progress in model usage, we much prefer asking a simple question than prompt engineering.

 

GPT-4 continues to lead the field by a large margin. A key question is for an existing investor in an LLM is whether the $100m or so investment will yield a return, after all, as Nathan on our team puts it, who wants something that’s a bit better than GPT 3 but a long way from GPT4 ?

We think the key to value, like in all competitive scientific endeavours, is in doing things differently.

We continue to be big fans of the potential for hardware/software combined solutions like the State Space models. These models have characteristics that transformers cannot match. They provide 5x the inference of an equivalent model, their performance actually improves with sequence length (the opposite is true of transformers), they can model long-range dependencies ( this is ‘holding in mind’ part of data that are the most relevant to one another, despite being a long distance apart in the sequence), Training and inference scale linearly, as opposed to quadratically in a transformer, and transformers cannot model anything outside a finite window.

2.2 Generative AI is a crowded space, how do you make money from it?

$29bn was invested last year into over 690 generative AI companies by the venture industry. We think there are potentially a number of problems affecting the durability of Generative AI solutions:

  1. Functionality being bundled with existing offerings by the incumbents – path of least resistance and very low marginal cost for the consumer.
  2. There are freemium alternatives in all the major Gen AI categories
  3. We are in the experimentation phase and switching costs are very low – not sure we compound the ARR’s with much certainty.
  4. There are emergent unplanned ‘meta intelligence’ capabilities being shown by the large models which should supplant current approaches to things like coding, indexing, and agentic behaviours.

Where we believe the investor wins is by investing in captive, self-reinforcing datasets. The sort of dataset which self-enhances through usage, leading a better and better outcome for the user, which is not replicable from common data sets. These types of companies create a huge cost to serve advantage for the customers, which continues to increase,  and hence so does its ROI, over time. It is the classic PIMS (profit impact of market share) effect applied to data, your moat becomes larger over time and more costly to replicate. There are a number of companies within specific industry verticals that have these characteristics and Accelerant intends to invest in a number of them.

2.3 Ontology

This is really a fancy word for classification, something LLM’s do pretty well. The intersection with value creation occurs as you introduce a customer’s proprietary frameworks for use in combination with their own data, and create increasing amounts of inference from the patterns that become visible. Early movers who have effectively done their own ‘pre-training’ have a proven solution that is generalisable across geographies. We see this in areas as diverse as crime detection, such is universal nature of human behaviours and their unintended ‘tell-tales’. Accelerant have identified companies in these spaces.

2.4 Data reading

It’s a fact that 90% of data captured is non-textual. There will, out of necessity for better understanding and insight, be a number of new methods to be derived here. It’s really interesting though, from both a compute and training perspective, that the most substantial part of the effort is being deployed on textual analysis and that ‘multi-modal analysis is in its infancy. Multi-modal analysis has a high-dimensional nature, this means that defining one data point in a multi-modal dataset has many more features that define it, and naturally leads to an exponentially increasing compute requirement when compared to text. It is likely that there will be some significant breakthroughs we cannot currently see, from areas of primary research that are in all probability tangential to the mainstream work in the area.

The following charts show that not much effort has been directed towards multi-modal understanding, the majority is still focused on language.

2.4 Basic human task substitution, now that we have human expert-level capabilities

 

The key to winning products in this area is reliability and accuracy, as, after all, you’re unlikely to substitute for a worse outcome. Agents are still poor at semantically clever responses, to what one might call ‘one shot’ questions that are not precisely phrased: the stochastic parrot remains alive and well. Core task performance is, pretty much at human expert levels, it’s dealing with those humans that’s the problem.

 

We think though it won’t be long until they will triumph is in ‘chain of thought’ or so called “long horizon task’  (in logic terms this is ‘if/then’ reasoning at its core). Say “Ask Mary whether she can meet on Tuesday or Thursday next week?” A task which involves ‘looping’ of emails, calendars, scheduling conflicts, work preferences etc. The issue is that every step has to be right in chain of thought processes. We suspect the key is reinforcement learning from human feedback (RLHF) and will ,once populated, have significant value in complex domains[1].

[1] This is for sake of example here, not talking about supplanting Microsoft and Google will doubtless continue to dominate in the tasks relating to the scheduling our lives!

[1] This is for sake of example here, not talking about supplanting Microsoft and Google will doubtless continue to dominate in the tasks relating to the scheduling our lives!

3.0 The White Space: the underdeveloped and underexploited niches

3.1 Inference is becoming much faster and cheaper

 

There is much talk about context windows’ remarkable size, enabling a ‘working memory’ that is multiple times larger than our own. This is certainly an asset when it comes to discovery and creative insight – surely the larger number of unrelated things you’re able to bang together is going to result in some remarkably useful combinations that we haven’t seen before.

 

The less talked about aspect is what is called the “residual stream’ or as the software engineers would knowingly joke to one another ‘the poor man’s adaptive compute’. This is the optimisation of the combination of CPU usage, memory, and storage. Flex one of these up significantly and you should be able  use less of the others.

 

It becomes most useful where you’re seeking to solve the hardest questions. Simply put, if I ask you a difficult question you will spend a long time to answer it, and probably keep going back over it. The ‘going back over it’ part is what in machine learning is called the forward pass.  The forward passes generate more tokens, and the good news is that tokens scale linearly: you can do a lot of them with a big context window.

 

The popular view amongst neuroscientists is that the human brain uses about 5-7 levels of revision (a very small number of forward passes to you and I) because it is very sample-efficient. Residual streaming does something very similar, they think: it compresses the amount of information that gets modified by its ‘tributaries’ over time, it contains a large amount of information in a very reduced form, that makes it compute efficient, whilst allowing the recomposition of the underlying data at any point.

 

The breakthroughs in inference will most likely run for a long time, as we’re a long way from the power of the sample efficient brain. GPT 4 has been trained on 1 trillion tokens, whereas the estimates for the number brain synapses are between 30 -300 trillion. Compute will still have a profound effect in the near term, for example the Gemini team estimate that if they increased their compute 10-fold, they’d have a 5-fold increase in capability, but it is sample efficiency that will likely be one the bigger contributors to improvements over time.

 

3.2 Model accuracy

 

We have previously talked about our favoured methods for improving this, notably counterfactual and symbolic execution methods, allowing data scientists to identify model errors and increase accuracy. Levels of model adoption will increase exponentially with small increments in accuracy, as the trade-offs between cost and acceptable accuracy are passed.

 

3.3 Zero based knowledge forecasting

 

We’re big fans of Bayesian flow methodology. Simply put it allows you to determine the factors driving an outcome in your business (data set), without having to set the factors you think are driving it at the beginning. This adds significantly to a businesses’ competitive advantage we think. To use the well worm adage, ‘ a nail will always look like nail, if all you have is a hammer’ – in other words if you decide what you’re looking for you will most likely screen out valuable signals that don’t fit your criteria, and you run the risk of ignoring dynamic change in the effect of both individual factors and how they work in combination. In summary then, most businesses are ‘GDP’ businesses, and if you can make them grow say 0.5% faster or reduce operating costs by a few points, the leverage to their equity values is very significant.

 

3.4 Creating new data

 

We currently struggle to provide truly profound insights into ‘smaller’ dimensional problems like language, when we have the entire canon of the internet at our disposal as our training dataset, and we’re “only” dealing with 7 dimensions: Lexical (meanings, frequencies, and relationships), Syntactic (Grammar and structure), Semantic (meaning and interpretation), Pragmatic (Context of the text), Discourse (Coherence), Stylistic (Tone, word choice, sentence structure), Structure (Organisation and formatting). It is worth listing them to put the dimensional problems of image analysis and robotics into context. The three-dimensional nature of movement, vision and object interaction almost certainly requires a training dataset far larger than the internet. It also requires a degree of definitional standardisation, as we’re in a territory where there is no common language that we would understand to describe the features of this landscape. That said, we already have analogues that will provide a very useful starting point for this. For example, we have no human understanding of the embeddings that are used to control say an Airbus, and yet they are functionally very effective. As far as the data is concerned, it is well accepted that 90% of data stored is not in textual form, you can begin to see that adapting enquiry should yield significant advances in understanding. Perhaps those millions of miles driven in driverless cars won’t have been wasted after all they can be pivoted towards a new set of problems.

4.0 Other Interesting stuff

3.1 Graphics that tell a story

3.2 Research papers etc.

 

The Anthropic Superposition paper – a model can learn compression strategies such that it can have more features than it has parameters

https://transformer-circuits.pub/2022/toy_model/index.html

 

The problems with more features than parameters https://transformer-circuits.pub/2023/monosemantic-features

 

AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy https://arxiv.org/abs/2402.07862

 

Approaching Human-Level Forecasting with Language Models. https://arxiv.org/abs/2402.18563

 

Taming Pre-trained LLMs for Generalised Time Series Forecasting via Cross-modal Knowledge Distillation https://arxiv.org/abs/2403.07300

 

Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities https://arxiv.org/abs/2402.10835

 

The Isomer Accelerant team – April 2024

Contacts:

Charles Schrager, Partner: cs@isomercapital.com

Mike Townend, Partner mt@isomercapital.com

 

 

Charles Schrager, Mike Townend
Partner