Released to the public at the end of 2022, OpenAI’s ChatGPT has quickly become the fastest-growing internet application ever and a disruptive mainstream phenomenon. The introduction of ChatGPT and its successors is a beginning of a new era in our relationship with technology.
But as we are becoming increasingly reliant on artificial intelligence and machine learning in almost every facet of our lives, we need to be more aware of the inner workings of these technologies.
Looking just a little below the surface can help us be more intentional with the use of AI, assess its outputs more critically, and fully tap into the available machine learning expertise.
In this article, we will delve into the history of ChatGPT and figure out how the tool functions.
It all started 30 years ago
Machine learning researchers and the scientific community in general have been captivated by the idea of artificial text generation for decades.
The core technology behind ChatGPT has its roots in recurrent neural networks (RNNs), which can be traced back to the 1980s. A neural network is a type of AI that mimics the human brain.
The work ‘recurrent’ in an RNN comes from its ability to store memory and refer to past data, which allows for a deeper understanding of the context and opens up predicting capabilities.
RNN is widely used in products that have to do with natural language processing, including Siri and Google’s voice search.
In 1997, the invention of a new type of RNN called Long Short-Term Memory (LTSM) network was a major step toward advancing text generation.
The ability to store more data for longer further deepened the algorithm’s understanding of words and how they relate to each other.
So, researchers figured out the way for machines to talk nearly 30 years ago, but only recently has it become possible for chatbots to become as good as they are.
Transformers as the main enabler
The real breakthrough for natural language processing happened in 2017 when Google Brain team developed a new type of deep learning model dubbed simply a transformer.
It’s hard to adequately explain the major advantages of transformers without diving into the intricacies of machine learning, but, generally speaking, transformers can process input data all at once instead of sequentially.
In other words, RNNs make sense of text word by word, while transformers analyze all text as a single set of data. Transformers also can take into account the location of a word in a sentence, enabling them to decipher the meaning of words more precisely.
Importantly, compared to RNNs, transformer-based models can be trained significantly faster, allowing for much larger training datasets.
The GPT Era
The first version of Generative Pre-trained Transfomer (GPT-1) was released in 2018 by OpenAI. By augmenting transformers with unsupervised learning, OpenAI managed to create what we now call a large language model.
This time, instead of training the model with annotated data, OpenAI allowed the model to detect patterns on its own. Given that data annotation is a very laborious and time-consuming process, the company was able to drastically enlarge the training dataset.
GPT-2 was released just a few months after the first version, and the third iteration followed in 2020. From this point on, it was all about enlarging datasets and increasing the number of model parameters.
Take note that these improvements were rather rapid than gradual. Just to give an idea, GPT-2 has 1.5 billion parameters that engineers could adjust during training, while GPT-3 has 175 billion.
However, due to the fact that GPT-3 was trained on the largest dataset ever using unsupervised learning, the model mirrored all the good and bad that the internet has to offer.
GPT-3 was exceptionally biased towards many sensitive subjects, including race, sex, and religion. To solve this problem, OpenAI came up with InstructGPT, a way less offensive and opinionated sibling of GPT-3.
To achieve this, OpenAI had to bring human judgment back into the equation. The company hired 40 contractors, who were responsible for rating the model’s answers to their liking with the ultimate goal of decreasing the use of toxic language and misinformation.
The current paid version of ChatGPT uses GPT-4, the most reliable, stable, and creative version of the model. Apart from a larger training dataset, GPT-4 also takes advantage of its ability to process both text and images instead of text only. For example, now you can provide ChatGPT a photo of ingredients you have and it will come up with a recipe.
The development of GPT, as well as natural language processing in general, will continue to open up new avenues for natural language processing.
From creating engaging conversational experiences with chatbots to providing deeper insights into the way we communicate, these advancements are undoubtedly revolutionizing our world in profound ways. It’s safe to say that the chatbot revolution is well and truly underway.