Skip to main content

fundamentals of LLM: A story from history of GPTs to the future

Introduction:

So there has been a lot of developments in LLM and I have not gone through any of it. In the coming few parts, I will talk about LLM and its related eco-system that has developed and will try to reach to the cutting or more like bleeding edge.

Lets go through the main concepts first.

What is LLM?

LLM[1] refers to large language models; that refer to mainly deep learning based big transformer models that can perform the natural language understanding and natural language generation tasks much better than the previous versions of the models generated in NLP history.

LLM models are generally quite big, in terms of 10-100GBs and they can't fit in even in one machine's ram. So, most LLMs are inferenced using bigger GPU cluster systems and are quite computationally exhaustive.

What was the first true LLM?

The BERTs

Transformers were invented on 2017 by vaswani et al in their revolutionary paper called "attention is all you need". After that we had the BERT models, and a series of BERT models came out. BERT models were good in sentence interpretation, next best word prediction and other tasks. But BERT models were not able to generate long and comprehensive text pieces. 

First of the GPTs

At the same time, openai had started working on their GPT series. GPT models came out first from 2018 in terms of GPT-1. GPT stands for generative pre-trained transformer. GPT-1 came with 117 million parameters. GPT-2 came out in 2019 with 1.5 Billion parameters. Even GPT-2 was possible to load in one PC and work with the model. 

GPT-2 model was very good in generating sentences and comprehensive texts; and it was possible to generate comprehensive small paragraphs even with GPT-2. 

But there were some problems with GPT-2 yet. The model used to get stuck in repetitive hell such as it would keep churning out the same tokens in series after a certain time; GPT-2 also would generate a lot of garbage characters sometimes. But GPT-2 was at par with the BERT models in many of the performance measures; as well as people started solving generative tasks with this model.

The winner steps in the arena

GPT-3 comes out on May,2020. GPT-3 was planned as to be a larger model of GPT-2 just scaled up; with a lot more parameters. GPT-3 had 175 Billion parameters; and among the models present in 2020, GPT-3 was the biggest. And GPT-3, in my humble opinion was the first true LLM. Because BERT had 110 million in its base-bert and 370 million parameters in BERT-large model. No model, at that time had so many parameters in it.

And OpenAI recognized the true game in the inception of GPT-3. This model, first time, felt as if we are talking to someone, not something. And the model had beat most of the NLP tasks. Be it, generation, question answer, summarization; you name it.

GPT-3 was though, not open-sourced. A company, named OpenAI, started as a non-profit and to guard AI from evils of big and bads, put their best model behind the bars of API based services. They started giving out API access on request basis.

The best and biggest researchers, industry impact folks got them first, then it was first come first serve. I remember applying for GPT-3 access back in 2020 november and receiving it in the beginning of 2021 feb. 

The model was trained on reddit corpus and a big chunk of internet; and its performance was surreal. I had my fair share of work with GPT-2 based generation tasks and extractive and abstract summarization; and one sweep of the white and green playground of GPT-3 swept my feet off the ground.

GPT-3 though, still had some of the GPT-2's problems. GPT-3 used to still generate silly replies; and character garbage (a jumble of lot of characters); as well as repetitive sentences or tokens too. GPT-3, also needed the token limit on how many tokens to generate; and would generate dangling sentences a lot. This was one of the problems we would have to work around that time. 

GPT-3 was unable to generate codes that were comprehensible too. GPT-3 would come up with good amount of blocks of codes, and they would work on small blocks too. But GPT-3 were not able to generate fully running codes yet. But it was possible to take help of GPT-3 for a simple coding problem; or also to generate a bulk to do debugging on it later.

The era of GOAT, chatGPT

GPT generated a instructgpt model; that is able to follow instruction and generate responses based on the instructions. The fine-tuning of chatGPT, a more conversion focused sibling model of the instructGPT, used the scale of the data from the previous GPTs but in increased scale; as well as reinforcement learning from human feedback (RLHF). 

chatGPT started on 30 nov 2022; and to the new year of 2023, chatgpt the gpt 3.5 model and gpt 4 model became a new sensation world wide. it prompted a whole new culture of AI, and propelled the conversation about AI vs human with its human-ish capacity.

chatGPT has versions that passed medical and Bar exams, and can answer all your questions in particular or generic manner in whatever tones you prefer. 

chatGPT, is the first version of the models in GPT series that is also able to generate codes very comprehensively. chatGPT, has become a productivity boost for everything in the world; and it has generated the new gold rush in human civilization since everyone thinks AI will be the new differentiator and learning how to best use chatGPT is the way to get the new metaphorical gold.

This chatGPT story will go is still evolving, with thousands of prompt books, courses, papers and researches every week. But for the scope of the article, we will move on from this and get into the next developments.

What are the trends that came up after invention of LLMs?

After ChatGPT, the mass and industry approached LLM with extreme interest. To quote Chip huyen[5], 

"It’s easy to make something cool with LLMs, but very hard to make something production-ready with them."

Industry looked into using ChatGPT or similar LLM tools in production but from that arose a number of different problems; such as hallucination, lack of context, cost and latency, proper prompting techniques etc. we will discuss each of these ideas in details in the following section.

Before that, lets delve into the concept of prompting.

What is Prompting?

Prompting refers to the art/science of writing/crafting a prompt that provides correct instructions to the LLM model/software in order to get the desired output.

Why prompting is required to learn?

Upto the invent of LLM, programming used to be done in programming languages, where the syntax and formats made sure that the languages give exact instructions to the machine; i.e. the codes written were not upto interpretation or weren't subjective. 

In case of LLMs, we provide the LLM a written natural language instruction text, which the LLM models process and provide output.

Now as you may know, LLMs are inside the hood a AI model that takes a text, processes it through its deep learning layers and provides an output. This process is stochastic, i.e. the output can vary based on each initiation of request, and there is a certain amount of randomness in the output. This randomness, or stochastic behavior of LLMs create random and different outputs from an LLM.

Now imagine you are trying to score a essay question, or trying to underwrite someone's credit worthiness using a LLM model. What if someone was rejected 5 times when you asked the LLM for a loan and then 6th time they are approved?

will that be an acceptable model? never.

Also, the problem we just described, where we can't expect a consistent output from an LLM model, is called the self-consistency problem of LLM.

LLMs also are seen to answer questions incorrectly or without "much thinking", in case of simple prompting techniques; or would not pick up tones, formats, guidances without proper way of prompting.

These are the reasons why one lane of research went into finding how to prompt, what are the different prompting techniques, how to get consistent output from a stochastic model etc. We will discuss the details of prompting techniques in a later post.

 

What is hallucination?

One of the biggest problems that came up when people started asking LLMs all sorts of question was that LLMs started lying. 

Yes, if you haven't seen it first hand, a LLM model can lie a lot. Whenever you ask it a question, that it doesn't know about inherently, or wasn't trained on; it can "cook up" an answer to that question out of its concepts and it will silently write it without giving you any warning, or saying that I don't know about this.

This problem arises from the structure of an LLM again. LLM, inherently a language model, doesn't have the concept of not knowing. It still generates the next best words repetitively to get a whole answer out of it. But not knowing an answer is a different concept, a discriminative concept. So what LLM does when it comes across a question it doesn't know the answer to? it predicts the best sounding answer according to its learnings.

ChatGPT currently doesn't do that as much as GPT3 used to do, but it is still possible to get hallucination answers to it. The way OpenAI solved it or attempted to solve it for chatgpt is they added guardrails i.e. softwares that monitors and corrects, improves or changes the raw outputs based on common sense and other needs. 

A lot of other models hosted by services that compete with chatgpt have also put fact checking system to put in real facts along with machine generated result as well as check their answer for hallucinations thoroughly.

Hallucinations can also hurt your usecase specifically when you use chatgpt in a very specific information usecase. For example, if there was a chatbot in airline tickets service system that used chatgpt, and that service start cooking up flight numbers when asked about flight from location A to location B, then soon the airline will have to shut down their services out of the lawsuits they will get.

Interested folks can read the story about air canada chatbot mishap here[6].

Hallucinations and other reasons led to one of the biggest research trends in LLMs in current days(the time of writing april,2024), which is RAG.

What is RAG?

RAG or retrieval augmented generation, is a process or framework to optimize the output of LLM model using an authoritative database outside the training dataset; that is referred to, retrieved from, and augments the generation from the LLM model. 

While RAG is a full framework and there are 100s of research and industrial work done on it since its inception with meta researchers[7], in short, RAG works on 3 fronts:

1. creating a separate database of information specific to the use-case of the LLM model

2. optimizing retrieval of information from the said database using vector database architecture and advanced searching and indexing techniques (very active research area)

3. getting outputs using the retrieved information and an enhanced prompt from the rudimentary prompt provided by an end user to get the most optimized output (active research area on how to enhance prompt, how to best use the information retrieved and other interlinking concepts here)

We will try to discuss RAG in another article later but for the scope of this article we will limit the discussion here. Interested readers can refer to the further reads [7], [8] and [9].

What are SLM?

Other than this, other researches started with reducing LLM sizes and inference speeds to get faster, smaller and better models for specific use-cases. From this research front, we come up with new set of models called SLM (small language models), typically having only a few billion parameters and lower GBs in sizes (1-10GB range). 

 

Conclusion:

There is a lot more work that is going on into understanding hallucination, reducing model sizes, defining advanced model performances. We have just scratched the surface of the LLM world with this article. But give yourself a kudos that you finished reading this article and got yourself through the beginning of the LLMs. 

If you liked this story, stay tuned for us and come back to us for another story of this series diving deeper into LLM world.

References:

[1] what is LLM? (analytics vidhya)

[2] brief history of GPT models

[3] chatgpt wiki  

[4] four LLM trends since ChatGPT

[5] chip huyen: building LLM applications for production

[6] air canada chatbot mishap shows GenAI issues 

[7] retrieval augmented generation from meta

[8]aws talks about RAG  

[9]prompting guide ai: what is RAG

Comments

Popular posts from this blog

Mastering SQL for Data Science: Top SQL Interview Questions by Experience Level

Introduction: SQL (Structured Query Language) is a cornerstone of data manipulation and querying in data science. SQL technical rounds are designed to assess a candidate’s ability to work with databases, retrieve, and manipulate data efficiently. This guide provides a comprehensive list of SQL interview questions segmented by experience level—beginner, intermediate, and experienced. For each level, you'll find key questions designed to evaluate the candidate’s proficiency in SQL and their ability to solve data-related problems. The difficulty increases as the experience level rises, and the final section will guide you on how to prepare effectively for these rounds. Beginner (0-2 Years of Experience) At this stage, candidates are expected to know the basics of SQL, common commands, and elementary data manipulation. What is SQL? Explain its importance in data science. Hint: Think about querying, relational databases, and data manipulation. What is the difference between WHERE ...

Spacy errors and their solutions

 Introduction: There are a bunch of errors in spacy, which never makes sense until you get to the depth of it. In this post, we will analyze the attribute error E046 and why it occurs. (1) AttributeError: [E046] Can't retrieve unregistered extension attribute 'tag_name'. Did you forget to call the set_extension method? Let's first understand what the error means on superficial level. There is a tag_name extension in your code. i.e. from a doc object, probably you are calling doc._.tag_name. But spacy suggests to you that probably you forgot to call the set_extension method. So what to do from here? The problem in hand is that your extension is not created where it should have been created. Now in general this means that your pipeline is incorrect at some level.  So how should you solve it? Look into the pipeline of your spacy language object. Chances are that the pipeline component which creates the extension is not included in the pipeline. To check the pipe eleme...

What is Bort?

 Introduction: Bort, is the new and more optimized version of BERT; which came out this october from amazon science. I came to know about it today while parsing amazon science's news on facebook about bort. So Bort is the newest addition to the long list of great LM models with extra-ordinary achievements.  Why is Bort important? Bort, is a model of 5.5% effective and 16% total size of the original BERT model; and is 20x faster than BERT, while being able to surpass the BERT model in 20 out of 23 tasks; to quote the abstract of the paper,  ' it obtains performance improvements of between 0 . 3% and 31%, absolute, with respect to BERT-large, on multiple public natural language understanding (NLU) benchmarks. ' So what made this achievement possible? The main idea behind creation of Bort is to go beyond the shallow depth of weight pruning, connection deletion or merely factoring the NN into different matrix factorizations and thus distilling it. While methods like know...