GPT: The Story Of The Technology Behind ChatGPT

You’ve probably heard of ChatGPT, but how much do you know about plain old GPT?

It might seem like ChatGPT came out of nowhere, but it was actually the result of years of research focused on a series of language models.

In this post, we’ll walk through a timeline that includes each model in the GPT series.

Along the way, we’ll explain key developments that eventually made ChatGPT possible.

The meteoric rise of ChatGPT

OpenAI released ChatGPT on November 30, 2022 and it crossed 1 million users just five days later, according to a tweet from Sam Altman, CEO of OpenAI.

Sam Altman tweet: ChatGPT launched

By January 2023, ChatGPT reached 100 million monthly active users, which made it the fastest-growing app in history.

Although its unprecedented user growth might make it seem like ChatGPT appeared overnight, it was actually the successor of a long line of models in the GPT series that were each developed based on breakthroughs in AI research.

Intro to GPT

GPT stands for Generative Pre-trained Transformer.

To put it simply, it’s a language model, which is a machine learning model that can conduct a probability distribution over sequences of words.

The model is “trained” using a large dataset of text. You can think of the training process as a robot reading a lot of books very quickly, similar to how a human learns.

Once the language model is trained, a simple use case would be for the model to predict the most appropriate word to fill a blank space in a sentence.

Building on this basic functionality, more advanced language models like GPT now have the ability to engage in human-like conversation.

GPT-1

On June 11, 2018, OpenAI released a research paper titled “Improving Language Understanding by Generative Pre-Training” (PDF), introducing what became known as GPT-1.

“Our approach is a combination of two existing ideas: transformers and unsupervised pre-training … Our system works in two stages; first we train a transformer model on a very large amount of data in an unsupervised manner—using language modeling as a training signal—then we fine-tune this model on much smaller supervised datasets to help it solve specific tasks.”

— OpenAI, “Improving language understanding with unsupervised learning”

Before GPT-1, the best models commonly used supervised learning, which has two major limitations, according to Priya Shree:

  1. It requires a large amount of annotated data for learning a particular task (making it expensive and time-consuming to train large language models).
  2. It fails to generalize for tasks other than what it has been trained for.

Unsupervised pre-training solved the first limitation and fine-tuning the model on smaller supervised datasets helped it begin to solve the second limitation.

GPT-2

On February 14, 2019, OpenAI announced GPT-2 (PDF).

“GPT-2 … was trained simply to predict the next word in 40GB of Internet text.”

— OpenAI, “Better language models and their implications”

However, due to “concerns about malicious applications of the technology,” OpenAI did not release the fully trained model.

Instead, they released “a much smaller model for researchers to experiment with.”

In terms of the relationship to its predecessor, GPT-2 was “a direct scale-up … with more than 10X the parameters and trained on more than 10X the amount of data.”

How many parameters? 1.5 billion

How much data? 8 million web pages

On November 5, 2019, OpenAI released the full model, citing “no strong evidence of misuse so far.”

GPT-3

On May 28, 2020, GPT-3 was originally introduced in a paper (PDF) by 31 OpenAI researchers and engineers.

This is an excerpt from the Abstract of the paper:

“Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting.”

— OpenAI, “Language Models are Few-Shot Learners”

If you’ve been keeping track, the prior model was trained with 1.5 billion parameters and GPT-3 was trained with 175 billion parameters. That’s a massive scale-up, increasing the number of parameters by over 100x.

On June 11, 2020, OpenAI made GPT-3 available via the company’s first commercial product, the OpenAI API. It was only available on a limited beta capacity and you had to join a waitlist.

On March 25, 2021, OpenAI reported that “more than 300 applications … and tens of thousands of developers” were using GPT-3.

On November 18, 2021, OpenAI announced that they would be removing the waitlist, allowing all developers (in supported countries) to sign up for the API.

GPT-3.5

Now, the moment you’ve been waiting for: the introduction of ChatGPT.

On November 30, 2022, OpenAI released ChatGPT to the general public.

Whereas GPT-3 was really only accessible for tech-savvy developers via the API, ChatGPT made GPT-3.5 available to everyone via an easy-to-use chat interface.

“We’ve trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.”

— OpenAI, “Introducing ChatGPT”

ChatGPT is actually just one of the systems fine-tuned from GPT-3.5. Another one, text-davinci-003, is better at long-form, “high-quality” writing.

Read more here about the different GPT models and associated apps/systems.

GPT-4

On March 14, 2023, OpenAI announced GPT-4.

As of now, this is the latest model in the series.

It can accept image inputs, in addition to text inputs, and emit text outputs.

As an example to demonstrate the improvements made to the latest model, GPT-4 “passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%.”

In terms of the number of parameters, several sources are saying that GPT-4 has over 1 trillion parameters. ?

The chart below from TechTarget shows the number of parameters for several more recent transformer-based language models.

A bar chart shows the number of parameters for several transformer-based language models, including GPT-2, GPT-3, and GPT-4
Credit: TechTarget

If you want to use GPT-4, you’ll have to sign up for the paid version of ChatGPT (called ChatGPT Plus) or can you can join the API waitlist for developers.

Conclusion

The rate at which language models, and AI in general, are advancing is mind-boggling. And maybe a little scary.

It is encouraging to see that OpenAI seems to be conscious of putting certain safeguards in place as it releases updated models.

In any case, it seems like AI is here to stay and we’re already seeing the far-reaching impacts this is having across all areas of human life.

In particular, AI text generators like ChatGPT are impacting anything that involves using words, and people employed in professions involving words are reacting in one of three ways: protesting against AI, struggling to keep up with AI, or learning as much as they can about how to use AI.

At AI Writers Academy, we focus on learning how to use AI.

Read more on our blog about how to use AI for writing.