A Product Builder's introduction to Large Language Models
Understanding the tech and core capabilities of LLMs
This is the first in a series of posts covering Generative AI for product builders.
I’ve lost count of the number of times in a day that the words ‘ChatGPT’ or ‘GenAI’ are spoken at the workplace these days. They're the shiny new objects in ML, promising to transform not just how we engage with digital devices, but perhaps even the future of human interaction itself.
You've likely heard of GPT-3, OpenAI's cutting-edge language model that's been making waves for its ability to write poems, answer questions, translate languages, and even draft emails that are almost indistinguishable from those written by humans. You’ve probably heard that Microsoft, Google, and AWS are all building or integrating LLMs in some form or another in their product offerings. In fact, several dozen new LLMs, each with their own flavours and idiosyncrasies, are now already in production.
As a product manager, you've likely found yourself in my shoes, wrestling with questions about the real potential of Large Language Models (LLMs) and the role they could play in your product strategy. When I started introducing myself to LLMs, the possibilities seemed endless, vague, and hyper-realistic at the same time. In the face of uncertainty, I default to developing frameworks that can stand the test of rapidly changing landscapes and ground my thought process, allowing me to produce something tangible.
This series is an exploration of some of these frameworks and mental models I use to think through the applications of GenAI. Over the next several posts, we'll peel back the layers of hype and delve into the real world of LLMs. We'll examine what they are, how they work, what they're capable of, and where they fall short. We'll explore how they're currently being used and their potential for future applications. And most importantly, we’ll discuss how to build products using LLMs.
What are Large Language Models?
An overtly simplistic (yet considerably powerful) way to look at large language models is that LLMs are sophisticated next word predictors. LLMs parse the context that is provided to them, and try to react to that context. They do so one word at a time.
But wait, this tech has been around for a while - remember what your phone’s keypad does when you’re typing?
LLMs are essentially this. What, then, makes them so much more powerful than autocomplete suggestions? Three factors contribute to the existence and power of LLMs:
New developments in Natural Language Algorithms: A 2017 paper called “Attention is all you need” (Vaswani, Ashish & Shazeer, Noam & Parmar, Niki & Uszkoreit, Jakob & Jones, Llion & Gomez, Aidan & Kaiser, Lukasz & Polosukhin, Illia. (2017)), became the defining piece of scientific literature enabling the development of Large Language Models. Again, oversimplifying, previous generations of next word predictors (like the one on your phone) preserved very little context in memory - sometimes as little as just the 2-3 words that came before the prediction. “Attention is all you need” addresses this limitation.
Access to large amounts of training data: The performance of any ML model depends in part on the quality, variety, and volume of data it is exposed to. LLMs take training data to the extreme - chatGPT is trained on almost the entire textual internet(!) as of 2021. Access to such a large corpus of data gives the LLM the ability to make informed predictions often grounded in fact, emulate multiple styles of writing, and tailor responses to context. As the saying goes, if you can think it, it exists on the internet. And if it exists on the internet, it was probably part of ChatGPT’s training data set.
Large amounts of processing power: Processing such large amounts of data, and using complex algorithms to do so, requires massive processing power. GPT-3 was trained on 285,000 processor cores and 10,000 graphics cards, equivalent to about 800 petaflops of processing power. (source). The energy resources consumed when training ChatGPT are sufficient to power a small town for a day.
Combine algorithmic advancements, access to the entire textual data of the internet, and massive processing power, and the results look like this:
[source: The Sydney Morning Herald]
Looks great, doesn’t it? What does this mean in practice?
A scalable mental model for understanding LLMs
I began this piece by mentioning that I will dive into some mental models and frameworks for understanding LLM applications. Such mental models are needed because the tech is evolving quite fast. New breakthroughs in LLM architecture as well as the paraphernalia surrounding them mean the pace of evolution is impossible to keep up with. Weekly summaries of advancements to the tech are everywhere, and none of us in the industry can keep up with what’s coming next.
This is all a lot of noise though. The core of the tech is not changing. I mentioned before that LLMs are essentially next word predictors. It is easy to understand from this interpretation why LLMs are classified into Generative AI - they ‘Generate’ text based outputs to user queries. This ‘generated text’ can be quite fungible, allowing for specifying styles and word limits, the audience for the generated text, and even the format of the output. Here are some example prompts and responses:
Me: Write a piece on the ethical implications of AI in the style of New York Magazine. Limit the output to 100 words
ChatGPT: As our digital realm becomes increasingly suffused with artificial intelligence, we are compelled to confront the ethical specter that looms over this brave new world. AI, a dazzling testament to human innovation, is capable of awe-inspiring feats yet brings with it profound moral quandaries. With each algorithm, we must grapple with questions of privacy, accountability, and bias. Are we ready to relinquish control to AI, and in doing so, to whom are we ceding this power? We stand at a precipice, poised to rewrite our society's ethical playbook in response to this technological enigma. The choices we make now will shape our collective future.
LLMs are also a powerful summarisation tool that can parse entire pages of documents and produce meaningful responses. They can extract the most important information and preserve the gist of information quite well.
Me: Summarise this article in 100 words: https://www.cnbc.com/2023/05/30/everyone-is-a-programmer-with-generative-ai-nvidia-ceo-.html
ChatGPT: Nvidia's CEO, Jensen Huang, predicts a new computing era where anyone can be a programmer just by speaking to the computer. He introduced the DGX GH200, an AI supercomputer platform aimed at building generative AI models. Generative AI can produce various types of content, including text, imagery, audio, and synthetic data. Huang highlighted that this computing era is special because generative AI can understand multimodality, impacting every industry. It will not only enhance new applications but also improve old ones. Huang believes that every existing application will be better because of AI, and its easy usability will drive rapid progress.
Let’s take this a step further. In order to understand how to respond to a prompt, at least in theory, one must be able to understand what the prompt is trying to achieve. This isn’t really how it works with LLMs - an LLM doesn’t necessarily have an ‘understanding’ of the prompt - it is simply producing a logical sequence of words that are a likely response to a query. Suspending this belief system, however, allows for an interesting application to emerge. Instead of asking the LLM to respond to a prompt, what if we ask it to ‘explain’ the prompt? Here’s an example snippet of me asking the LLM to identify the intents expressed in a query:
Me: You are a shopping assistant bot. Parse this user query for intents: "What is a good toaster for light daily use? I like my bread crisp and golden brown. I want something that is not too expensive." Specify the intents expressed in the query in the following format:
"Product being mentioned: ____; Intent 1, score (0-1) Intent 2, score (0-1) Intent 3, score (0-1) Intent 4, score (0-1) Intent 5, score (0-1)" Product 5 intents.
ChatGPT: Product being mentioned: Toaster;
Intent 1: Product recommendation; Score: 0.9
Intent 2: Frequency of use; Score: 0.7
Intent 3: Desired toast outcome; Score: 0.8
Intent 4: Price preference; Score: 0.9
Intent 5: Affordability; Score: 0.7
The LLM was correctly able to identify the product being talked about, the user’s intent to get product recommendations, the desire for these products to be within. specific price range, etc. This level of precision is very hard to achieve with any other query parsing algorithms in production.
The three examples above provide a good framework for thinking about the capabilities of LLMs:
LLMs can understand the user’s query and distill it in machine readable formats.
They can generate a response to these queries, supporting specifications on audiences, size and format of text, and additional context on the response.
They can summarise long text into digestible bits of information.
These three capabilities have a wide range of applications, like making sales and support chatbots, writing long form texts and articles, making search and catalogue experiences better, enabling personalisation, generating marketing and media content, and even creating learning plans on a topic of your choice. Permutations and combinations of the three basic capabilities I’ve mentioned above also lead to a bunch of emergent applications like structuring, drawing outlines, translations, moderation, and so forth - really, the limit only exists in the creativity with which one combines these capabilities. This is a topic for another time, though.
I’ll stop this post here, with the following key takeaways:
LLMs are essentially next word predictors, with no hidden ‘logic’ or ‘conscience’ working behind the scenes.
Three core capabilities of LLMs provide the cornerstone for product applications, namely, query understanding, content generation, and content summarisation.
Combining these core capabilities leads to a wide range of applications limited only by one’s creativity.
The next few posts will build upon these capabilities and develop a few practical application frameworks. I’ll also cover some end to end product flows and discuss some of the limitations and production considerations of LLMs. If there’s anything more you would like to discuss, I’d love to hear about it!
This is the first in a series of posts covering Generative AI for product practitioners. If you’d like to follow along for the rest of the series, here’s what you can do:
Have you been using LLMs in product development? What was your experience like?
When are the next parts of this series coming? been more than a month