The GPT revolution: a powerful language model that generates human-level text
GPT-3 is an unsupervised language model that can learn without any annotated data. By contrast, other language models use supervised learning, which means that the model learns from data that has been annotated by humans. In addition, GPT-3 is significantly more accurate than other language models on a variety of tasks.
The description has been generated by the language model GPT-3 and it's using well-written sentences and is pretty accurate. In this article, we will dive into the capabilities of GPT, how it all started and limitations and concerns.
In recent years, significant progress is made in Natural Language Processing (NLP). NLP is a subfield of computer science that allows machines to understands text and spoken words like humans do. It focuses on linguistic tasks like text generation, summarization, and reading comprehension. One player that stands out is OpenAI. OpenAI is a research and development company that focuses on machine learning. In 2019 they received an investment of $1 billion from Microsoft. Back in 2018, OpenAI released a paper about generative pre-training (GPT). The paper proposed a method to train a language model with unlabeled data and then fine-tune by providing specific examples. An advantage is that the language model can be used for different tasks. Prior to this work, you would train a language model to solve a specific task. OpenAI claimed that GPT model outperformed other models that are created to solve a specific task.
"Our general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied." - OpenAI
GPT: Generative pre-training transformer model
- Paper released June 2018
- Proposed generative pre-training transformer model
- Trained with the BookCorpus dataset
- 117M parameters
After GPT-1 the successors GPT-2 and GPT-3 were even more powerful. The architecture didn't change but more parameters were added and the model was trained with a larger dataset.
GPT-2: "Because of malicious risks we don't release the full model"
- 124 million model released Feb 2019
- 774 million model released Aug 2019
- 1.5 billion model released Nov 2019
- Trained with the WebText dataset (40GB of internet text)
GPT-2 was able to understand the nature of the task and provide answers. This is what they call zero-shot task transfer. For example, you can give the model a prompt with an English sentence and the word French. The model understands that it should translate the English sentence into French.
English: What rooms do you have available?
French: Quelles chambres avez-vous de disponible?
English: Where is the restroom?
French:
OpenAI scraped Reddit and pulled data from outbound links that received at least three upvotes. This to ensure that the scraped articles are of good quality (or to reduce the risk that articles were not good). The model was trained with 40GB of internet text (8 million web pages). They call this dataset WebText. The model contains 1.5 billion parameters, 10 times more than GPT-1. For the initial release, OpenAI didn't release the full model but 'only' a 124 million parameters model. The creators of the model were too afraid that people would use the model maliciously. However, in two stages, the complete model was eventually released. Looking back it feels like the releasing strategy was part of a marketing plan to create an hype.
GPT-3: Shockingly good!
- 175 billion model released Jun 2020
- Commercialized, pay per use
- Access the model through API
- Trained with CommonCrawl, WebText, books corporate and English Wikipedia
"OpenAI’s new language generator GPT-3 is shockingly good"
"Playing with GPT-3 feels like seeing the future"
"OpenAI's GPT-3 model is a true revolution"
OpenAI announced GPT-3 in June 2020 and the first results were astounding. You can hardly (or not at all) see the difference between text generated by GPT-3 and written by humans. At this point, OpenAI decided to commercialize the model by putting an API in front. You could signup for the beta program and try out the API for free for two months. After that, you would be charged per API call. The model includes 175 billion parameters, 10 times more parameters than GPT-2. GPT-3 is trained with a dataset from CommonCrawl (570GB after filtering), WebText, books corporate, and English language Wikipedia. Because of the large number of parameters and the huge dataset, it was trained on, GPT-3 is good in few shots and zero-shot learning. That means the model can do all kinds of different tasks generating summaries, SQL queries, JS code, social media posts, classifications, conversations, translations, and many more. In the prompt, you tell the model what it should do and you provide examples.
Because the model is trained with lots of text on the internet, it understands a lot. For example, take this prompt that lists a company and categories. (the text in bold was generated)
The following is a list of companies and the categories they fall into
Facebook: Social media, Technology
LinkedIn: Social media, Technology, Enterprise, Careers
Uber: Transportation, Technology, Marketplace
Unilever: Conglomerate, Consumer Goods
Mcdonalds: Food, Fast Food, Logistics, Restaurants
FedEx: Logistics, Transportation
It's also capable of writing creative text.
Topic: Breakfast
Two-Sentence Horror Story: He always stops crying when I pour the milk on his cereal. I just have to remember not to let him see his face on the carton.
###
Topic: Wind
Two-Sentence Horror Story:
I was walking down the street when a cold gust of wind blew past me. I looked up and saw a man without a face.
Or write reviews about a restaurant based keywords.
Write a restaurant review based on these notes:
Name: The Blue Wharf
Lobster great, noisy, service polite, prices good.
Review:
The Blue Wharf is a restaurant that is located in a noisy area, but the lobster is great. The service is polite, and the prices are good.
Final thoughts
Yes, the GPT is a powerful language model because it can generate high-quality text. There are many use cases where you can apply this model. However, there are some limitations and concerns. Even though you can tell the model what to generate by using the prompt, it's hard to control the actual output. You can tweak your prompt to be more specific or adjust settings. Occasionally, you will see false facts or wrong statements in the generated text. The way to solve this is to let a human review the text or automate this by using another machine learning model.
OpenAI was founded in 2015 as a non-profit artificial intelligence research company. Their original goal "is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return". You can now wonder how 'open' OpenAI actually is? In contrast to GPT-2, it's successor GPT-3 is not made publicly available. Also, they commercialized the use of this model. OpenAI claimed that they needed to do this in order to improve and continue their research. But the fact that Microsoft invested millions of dollars in the company probably plays a big role. This resulted that many people were disappointed in OpenAI and started their own initiative to replicate GPT-3 https://blog.eleuther.ai/why-release-a-large-language-model/.
In the meantime, Microsoft who has an exclusive license for GPT-3 is integrating the language model into their products. In May, Microsoft announced that they integrated GPT-3 in their Power Apps studio product which allows users to use natural language to generate queries. Just recently, they launched GitHub Copilot, an extension that generates code for developers. For this GPT-3 was trained with all public repositories (billion lines of code) on GitHub.
GPT-3 is a powerful language model which can generate text that you can't distinguish from text written by humans. It's expected that we will see more companies and communities try to develop a similar language model. However, with the current architecture of these models, you'll need a huge dataset and a lot of money to train it. To see in perspective, the estimated costs to train GPT-3 is 4.6 million dollars. This makes it difficult for new researchers and companies with low computational power (to train the model) to enter the market.