fundamentals of LLM: A story from history of GPTs to the future

Introduction:

So there has been a lot of developments in LLM and I have not gone through any of it. In the coming few parts, I will talk about LLM and its related eco-system that has developed and will try to reach to the cutting or more like bleeding edge.

Lets go through the main concepts first.

What is LLM?

LLM[1] refers to large language models; that refer to mainly deep learning based big transformer models that can perform the natural language understanding and natural language generation tasks much better than the previous versions of the models generated in NLP history.

LLM models are generally quite big, in terms of 10-100GBs and they can't fit in even in one machine's ram. So, most LLMs are inferenced using bigger GPU cluster systems and are quite computationally exhaustive.

What was the first true LLM?

The BERTs

Transformers were invented on 2017 by vaswani et al in their revolutionary paper called "attention is all you need". After that we had the BERT models, and a series of BERT models came out. BERT models were good in sentence interpretation, next best word prediction and other tasks. But BERT models were not able to generate long and comprehensive text pieces.

First of the GPTs

At the same time, openai had started working on their GPT series. GPT models came out first from 2018 in terms of GPT-1. GPT stands for generative pre-trained transformer. GPT-1 came with 117 million parameters. GPT-2 came out in 2019 with 1.5 Billion parameters. Even GPT-2 was possible to load in one PC and work with the model.

GPT-2 model was very good in generating sentences and comprehensive texts; and it was possible to generate comprehensive small paragraphs even with GPT-2.

But there were some problems with GPT-2 yet. The model used to get stuck in repetitive hell such as it would keep churning out the same tokens in series after a certain time; GPT-2 also would generate a lot of garbage characters sometimes. But GPT-2 was at par with the BERT models in many of the performance measures; as well as people started solving generative tasks with this model.

The winner steps in the arena

GPT-3 comes out on May,2020. GPT-3 was planned as to be a larger model of GPT-2 just scaled up; with a lot more parameters. GPT-3 had 175 Billion parameters; and among the models present in 2020, GPT-3 was the biggest. And GPT-3, in my humble opinion was the first true LLM. Because BERT had 110 million in its base-bert and 370 million parameters in BERT-large model. No model, at that time had so many parameters in it.

And OpenAI recognized the true game in the inception of GPT-3. This model, first time, felt as if we are talking to someone, not something. And the model had beat most of the NLP tasks. Be it, generation, question answer, summarization; you name it.

GPT-3 was though, not open-sourced. A company, named OpenAI, started as a non-profit and to guard AI from evils of big and bads, put their best model behind the bars of API based services. They started giving out API access on request basis.

The best and biggest researchers, industry impact folks got them first, then it was first come first serve. I remember applying for GPT-3 access back in 2020 november and receiving it in the beginning of 2021 feb.

The model was trained on reddit corpus and a big chunk of internet; and its performance was surreal. I had my fair share of work with GPT-2 based generation tasks and extractive and abstract summarization; and one sweep of the white and green playground of GPT-3 swept my feet off the ground.

GPT-3 though, still had some of the GPT-2's problems. GPT-3 used to still generate silly replies; and character garbage (a jumble of lot of characters); as well as repetitive sentences or tokens too. GPT-3, also needed the token limit on how many tokens to generate; and would generate dangling sentences a lot. This was one of the problems we would have to work around that time.

GPT-3 was unable to generate codes that were comprehensible too. GPT-3 would come up with good amount of blocks of codes, and they would work on small blocks too. But GPT-3 were not able to generate fully running codes yet. But it was possible to take help of GPT-3 for a simple coding problem; or also to generate a bulk to do debugging on it later.

The era of GOAT, chatGPT

GPT generated a instructgpt model; that is able to follow instruction and generate responses based on the instructions. The fine-tuning of chatGPT, a more conversion focused sibling model of the instructGPT, used the scale of the data from the previous GPTs but in increased scale; as well as reinforcement learning from human feedback (RLHF).

chatGPT started on 30 nov 2022; and to the new year of 2023, chatgpt the gpt 3.5 model and gpt 4 model became a new sensation world wide. it prompted a whole new culture of AI, and propelled the conversation about AI vs human with its human-ish capacity.

chatGPT has versions that passed medical and Bar exams, and can answer all your questions in particular or generic manner in whatever tones you prefer.

chatGPT, is the first version of the models in GPT series that is also able to generate codes very comprehensively. chatGPT, has become a productivity boost for everything in the world; and it has generated the new gold rush in human civilization since everyone thinks AI will be the new differentiator and learning how to best use chatGPT is the way to get the new metaphorical gold.

This chatGPT story will go is still evolving, with thousands of prompt books, courses, papers and researches every week. But for the scope of the article, we will move on from this and get into the next developments.

What are the trends that came up after invention of LLMs?

After ChatGPT, the mass and industry approached LLM with extreme interest. To quote Chip huyen[5],

"It’s easy to make something cool with LLMs, but very hard to make something production-ready with them."

Industry looked into using ChatGPT or similar LLM tools in production but from that arose a number of different problems; such as hallucination, lack of context, cost and latency, proper prompting techniques etc. we will discuss each of these ideas in details in the following section.

Before that, lets delve into the concept of prompting.

What is Prompting?

Prompting refers to the art/science of writing/crafting a prompt that provides correct instructions to the LLM model/software in order to get the desired output.

Why prompting is required to learn?

Upto the invent of LLM, programming used to be done in programming languages, where the syntax and formats made sure that the languages give exact instructions to the machine; i.e. the codes written were not upto interpretation or weren't subjective.

In case of LLMs, we provide the LLM a written natural language instruction text, which the LLM models process and provide output.

Now as you may know, LLMs are inside the hood a AI model that takes a text, processes it through its deep learning layers and provides an output. This process is stochastic, i.e. the output can vary based on each initiation of request, and there is a certain amount of randomness in the output. This randomness, or stochastic behavior of LLMs create random and different outputs from an LLM.

Now imagine you are trying to score a essay question, or trying to underwrite someone's credit worthiness using a LLM model. What if someone was rejected 5 times when you asked the LLM for a loan and then 6th time they are approved?

will that be an acceptable model? never.

Also, the problem we just described, where we can't expect a consistent output from an LLM model, is called the self-consistency problem of LLM.

LLMs also are seen to answer questions incorrectly or without "much thinking", in case of simple prompting techniques; or would not pick up tones, formats, guidances without proper way of prompting.

These are the reasons why one lane of research went into finding how to prompt, what are the different prompting techniques, how to get consistent output from a stochastic model etc. We will discuss the details of prompting techniques in a later post.

What is hallucination?

One of the biggest problems that came up when people started asking LLMs all sorts of question was that LLMs started lying.

Yes, if you haven't seen it first hand, a LLM model can lie a lot. Whenever you ask it a question, that it doesn't know about inherently, or wasn't trained on; it can "cook up" an answer to that question out of its concepts and it will silently write it without giving you any warning, or saying that I don't know about this.

This problem arises from the structure of an LLM again. LLM, inherently a language model, doesn't have the concept of not knowing. It still generates the next best words repetitively to get a whole answer out of it. But not knowing an answer is a different concept, a discriminative concept. So what LLM does when it comes across a question it doesn't know the answer to? it predicts the best sounding answer according to its learnings.

ChatGPT currently doesn't do that as much as GPT3 used to do, but it is still possible to get hallucination answers to it. The way OpenAI solved it or attempted to solve it for chatgpt is they added guardrails i.e. softwares that monitors and corrects, improves or changes the raw outputs based on common sense and other needs.

A lot of other models hosted by services that compete with chatgpt have also put fact checking system to put in real facts along with machine generated result as well as check their answer for hallucinations thoroughly.

Hallucinations can also hurt your usecase specifically when you use chatgpt in a very specific information usecase. For example, if there was a chatbot in airline tickets service system that used chatgpt, and that service start cooking up flight numbers when asked about flight from location A to location B, then soon the airline will have to shut down their services out of the lawsuits they will get.

Interested folks can read the story about air canada chatbot mishap here[6].

Hallucinations and other reasons led to one of the biggest research trends in LLMs in current days(the time of writing april,2024), which is RAG.

What is RAG?

RAG or retrieval augmented generation, is a process or framework to optimize the output of LLM model using an authoritative database outside the training dataset; that is referred to, retrieved from, and augments the generation from the LLM model.

While RAG is a full framework and there are 100s of research and industrial work done on it since its inception with meta researchers[7], in short, RAG works on 3 fronts:

1. creating a separate database of information specific to the use-case of the LLM model

2. optimizing retrieval of information from the said database using vector database architecture and advanced searching and indexing techniques (very active research area)

3. getting outputs using the retrieved information and an enhanced prompt from the rudimentary prompt provided by an end user to get the most optimized output (active research area on how to enhance prompt, how to best use the information retrieved and other interlinking concepts here)

We will try to discuss RAG in another article later but for the scope of this article we will limit the discussion here. Interested readers can refer to the further reads [7], [8] and [9].

What are SLM?

Other than this, other researches started with reducing LLM sizes and inference speeds to get faster, smaller and better models for specific use-cases. From this research front, we come up with new set of models called SLM (small language models), typically having only a few billion parameters and lower GBs in sizes (1-10GB range).

Conclusion:

There is a lot more work that is going on into understanding hallucination, reducing model sizes, defining advanced model performances. We have just scratched the surface of the LLM world with this article. But give yourself a kudos that you finished reading this article and got yourself through the beginning of the LLMs.

If you liked this story, stay tuned for us and come back to us for another story of this series diving deeper into LLM world.

References:

[1] what is LLM? (analytics vidhya)

[2] brief history of GPT models

[3] chatgpt wiki

[4] four LLM trends since ChatGPT

[5] chip huyen: building LLM applications for production

[6] air canada chatbot mishap shows GenAI issues

[7] retrieval augmented generation from meta

[8]aws talks about RAG

[9]prompting guide ai: what is RAG

20 Must-Know Math Puzzles for Data Science Interviews: Test Your Problem-Solving Skills

Introduction: When preparing for a data science interview, brushing up on your coding and statistical knowledge is crucial—but math puzzles also play a significant role. Many interviewers use puzzles to assess how candidates approach complex problems, test their logical reasoning, and gauge their problem-solving efficiency. These puzzles are often designed to test not only your knowledge of math but also your ability to think critically and creatively. Here, we've compiled 20 challenging yet exciting math puzzles to help you prepare for data science interviews. We’ll walk you through each puzzle, followed by an explanation of the solution. 1. The Missing Dollar Puzzle Puzzle: Three friends check into a hotel room that costs $30. They each contribute $10. Later, the hotel realizes there was an error and the room actually costs $25. The hotel gives $5 back to the bellboy to return to the friends, but the bellboy, being dishonest, pockets $2 and gives $1 back to each friend. No...

Machine learning and statistics with python

Search This Blog

fundamentals of LLM: A story from history of GPTs to the future

Introduction:

What is LLM?

What was the first true LLM?

The BERTs

First of the GPTs

The winner steps in the arena

The era of GOAT, chatGPT

What are the trends that came up after invention of LLMs?

What is Prompting?

What is hallucination?

What is RAG?

What are SLM?

Conclusion:

References:

Labels

Comments

Post a Comment

Popular posts from this blog

Spacy errors and their solutions

Mastering SQL for Data Science: Top SQL Interview Questions by Experience Level

20 Must-Know Math Puzzles for Data Science Interviews: Test Your Problem-Solving Skills

Machine learning and statistics with python

subscribe!

fundamentals of LLM: A story from history of GPTs to the future

Introduction:

What is LLM?

What was the first true LLM?

The BERTs

First of the GPTs

The winner steps in the arena

The era of GOAT, chatGPT

What are the trends that came up after invention of LLMs?

What is Prompting?

What is hallucination?

What is RAG?

What are SLM?

Conclusion:

References:

Labels

Comments

Post a Comment

Popular posts from this blog

Spacy errors and their solutions

Mastering SQL for Data Science: Top SQL Interview Questions by Experience Level

20 Must-Know Math Puzzles for Data Science Interviews: Test Your Problem-Solving Skills