GenAI - The Era of Reasoning
In the last two years since ChatGPT came into our lives (launched Nov ‘22) and disrupted the way we searched, wrote notes, and made itineraries, the Generative AI landscape has evolved rapidly, pushing boundaries and opening up new arenas in which LLMs can play.
A solidified market structure has emerged with a key set of players and alliances with vast capital forming the foundation layer. Only scaled players like Microsoft/OpenAI, AWS/Anthropic, Meta, and Google/DeepMind with economic engines remain in play and their focus is on the development and scaling of the reasoning layer.
A recent report by Sequoia gives an overview of what these players are focusing on and takes a shot at predicting what the future could look like. Here are three things that you need to know from the report.
System 2 Thinking in LLMs
While all of us are still admiring the rapid response of ChaptGPT to questions that would take hours for us to arrive at independently, OpenAI has been training o1 (also called Q* or Strawberry) to work on mathematical problems that are yet to be solved. LLM models can now “stop and think” or apply System 2 thinking before responding.
Remember the time, way back in school, when you were given two sets of data (like words and their meanings, equations, and solutions), and you had to match the ones in the left column with the most accurate matches on the right? You would call upon the powers of your memory of having made these matches multiple times in the past to match them now rapidly. This is System 1 thinking, or “thinking fast,” as psychologist Daniel Kahneman called it.
In System 1, a model is pretrained on millions of moves in Chess or petabytes of internet-scale text (LLMs). Its job is to mimic patterns at rapid speed. The answers come from what’s been fed into it, and it’s not expected to develop something new.
The models of LLM that you have been using, like ChatGPT 4, have been doing just that. They’re playing a rapid game of pattern matching to the closest accurate result. You either know the capital of Hungary, or you don’t. You can’t logically deduce that answer.
A System 2 model does not pick knee-jerk responses from a database; it runs a search or simulation across a wide range of potential future scenarios, scores those scenarios on the probability of best outcomes, and chooses the one with the highest score. Imagine entering an unexplored maze. The way out will not be instantly obvious or something that can be guessed correctly from memory. We will have to take a path, end up in a few wrong turns, apply reasoning to understand failure, retrace our steps, and try a new way. System 2 would ideally go through this process and add scores for each route based on its failure and success.
Given enough compute time, System 2 models can surprise humans with untrained answers, the way AlphaGo did against legendary Go master Lee Sedol. o1 is showing the ability to backtrack when it gets stuck!
This leap from pre-trained instinctual responses (”System 1”) to deeper, deliberate reasoning (“System 2”) is the next frontier for AI. Audits of o1 have shown exciting results that resemble how humans think and reason. It demonstrates the ability to think about problems the way a human would and in new ways that humans would not!
While the “scoring” is somewhat straightforward in scenarios like playing a game (you either win or don’t) or coding (you can test the code), there is work to be done in open-ended, unstructured domains. How do you score an essay or a design?
How do we tap into the powerful GenAI models?
These models are getting more and more powerful. But are we equipped to tap into their potential?
Think of the GenAI models as layers. The last two years scientists focused on scaling the model on large amounts of data and matching them with questions at record speed. Now, the focus is on the reasoning layer, where the model is asked to use all the data and take its time to think before responding.
As OpenAI, Anthropic, Google, and Meta scale their reasoning layers, the models have largely failed to make it into the application layer as products, except ChatGPT. Researchers are focusing on horizontal general-purpose reasoning, we still need application or domain-specific reasoning to deliver helpful AI agents.
Consumers can’t stare at a blank prompt and not know what to ask. The key is in the application layer. Two years ago, apps that provided purpose-specific UI to interact with the GPT models were dismissed as just wrappers (Copy.ai, Perplexity etc.). These have actually been the most effective ways of tapping into these LLM models.
These “wrappers” are not just UIs on a model. They have sophisticated cognitive architectures that typically include multiple foundation models with some sort of routing mechanism on top, vector and/or graph databases, compliance, and application logic that mimics the way a human might think through a workflow.
Take the example of a Factory Droid executing a migration plan to update a service from one backend to another. It breaks down all of the dependencies, proposes relevant code changes, adds unit tests, and waits for a human to review. Then after approval, run the changes across all of the files in a dev environment and merge the code if all the tests pass - just the way a human would.
With deep understanding of a domain and its use cases, new cognitive architectures and user interfaces can shape how these reasoning capabilities are delivered to users.
Which real life applications of GenAI should I explore?
AI tools and apps are gaining an edge in turning labor into software. The manual tasks that are a drain on resources, operational in nature, and prone to manual errors are being replaced by AI tools. The efficiency is measured in outcomes and not per seat.
Many of these need human-in-the-loop and are taking the copilot model. Few, like Sierra, are operating at a 100% resolution model.
The use cases will be vertical and function specific. It could be resolving customer issues in a conversational manner, reading regulatory documents and breaking them down for implementation, creating RM training modules, and even help find opportunities for cross-sales.
Banks like BBVA have started categorizing customer spending on the banking app and offer personalized saving advice. JPMorgan Chase has filed a patent application for a gen AI service to help investors select equities. A leading Asian bank is using gen AI to automate ESG reporting that RMs were manually summarizing for B2B customers. The use cases are limitless.
Once these models can do more, we will have teams accomplishing a lot more!
Have questions on specific applications for your function? Happy to chat.
AI Glossary:
LLM or Large Language Models – A type of machine learning/deep learning model that can perform a variety of natural language processing (NLP) and analysis tasks, including translating, classifying, and generating text; answering questions in a conversational manner; and identifying data patterns
System 1 and System 2 Thinking - Two distinct modes of cognitive processing introduced by Daniel Kahneman in his book Thinking, Fast and Slow. System 1 is fast, automatic, and intuitive, operating with little to no effort. This mode allows us to make quick decisions and judgments based on patterns and experiences. In contrast, System 2 is slow, deliberate, and conscious, requiring intentional effort. This type of thinking is used for complex problem-solving and analytical tasks where more thought and consideration are necessary.
Sources: