ChatGPT Principles for Dummies

RAG, Long Context, and the Training Journey of Large Language Models

Apr 06, 2024

Language large models are based on artificial neural networks. Simply put, the basic version of ChatGPT takes an input (a string of numbers) and a bunch of "fixed" numbers to calculate the corresponding result.

Later, products such as https://www.perplexity.ai/, https://poe.com/, GPTs, Gemini 1.5, and https://kimi.ai/ emerged, along with related concepts like RAG (Retrieval Augmented Generation) and Long Context.

Training

In addition, there is a concept that is often misused, "training." Technically, the most rough and precise classification includes pre-training and fine-tuning. The most fundamental characteristic is that all these change the weight parameters, which are the aforementioned "fixed" numbers.

Some people refer to dialogue as training GPT, and others refer to uploading documents on GPTs as training GPT, but these are not technically real training because these do not change the model's parameters at all.

RAG

Then there is the difference between uploading documents and having a conversation. These two are more about the difference in length rather than type. Because current large models have a limited context length they can handle, initially only a few thousand words. So if you want to cover as much content as possible, you have to make trade-offs. This is actually RAG, Retrieval Augmented Generation, which can also be called selective context. Although it's said to be a compromise, why is it called enhancement? This is because RAG has another advantage: it can reduce the hallucinations, or even errors, in the output of large language models. Of course, enhancement is also because it puts relevant content in the context for the large language model to refer to. But the premise of enhancement is to refer to the right content; if the content is not suitable, it might be better to rely on the model's basic response. The model is trained as a whole and often has a deeper and more comprehensive understanding of knowledge content.

Long Context

Large language models have limited context. Not to mention the purely technical issues, it's also because as the model's context becomes longer, the demand for computing power increases exponentially. That means few people can afford it, and the time it takes also becomes longer, so people can't wait. The model's calculation of context is a bit like real-time training, so it's naturally a time-consuming process. In the foreseeable future, the length of context will not increase indefinitely, so RAG will still have its place.

Prompts

The input mentioned above is actually what is commonly referred to as the prompt. Some prompts are used to express needs, while others guide the model's processing approach.

Related Products

Perplexity is a type of RAG that selects useful content from search engine results to refer to, both at the page level and below the page level. If the reference is not good, it can also lead to poor results.

Poe does not produce models but is a model aggregator. Of course, enhanced features like GPTs and products from large language model companies will not be the same. Poe has its own bot similar to GPTs.

GPTs has RAG functionality, but its implementation is not great. The so-called action is not an additional capability; since large language models can write code, they naturally have the ability to call APIs.

Gemini 1.5 and Kimi are similar in that they provide an ultra-large context. Some vendors take advantage of the confusion and pass off RAG as long context. The simplest way to distinguish is that ultra-large context will definitely be slower, while RAG can be very fast.

Here we won't discuss multimodal capabilities like image recognition, drawing, and voice, as well as agents that are more like future than capabilities.

To fundamentally improve the capabilities of large language models is not something ordinary people can play with. Platforms sometimes provide a certain degree of fine-tuning capabilities, but they are often just adjustments in preferences, style, and format.

Verse 1:

In the digital sea, we're drifting, AI's plotting the course,

With words as anchors, and data as force.

In the land of RAG and lengthy contexts, we dive deep,

Hoping the secrets of the neural nets we can keep.

Chorus:

Oh, the AI sings in codes and strings,

In a world where data is the king.

Long context dreams, and RAG seems,

To be the magic, or so it deems.

Verse 2:

Prompting the journey with a wordy key,

Seeking simplicity in a complex sea.

Poe and Perplexity, the game's delight,

Where models perform, in the tech's spotlight.

Chorus:

Oh, the AI sings in codes and strings,

In a world where data is the king.

Long context dreams, and RAG seems,

To be the magic, or so it deems.

Bridge:

In the race of bytes and bits, where illusions fleet,

Truth and tech in a dance, can they ever meet?

Slow is the long context, fast is the RAG,

In the marketing fog, truths and tales wag.

Verse 3:

On the stage of AI, every player plays its part,

Satire is, we chase tomorrow with a blinded heart.

Remember in this tech marathon, it’s clear,

Insight, not speed, brings the truth near.

Chorus:

Oh, the AI sings in codes and strings,

In a world where data is the king.

Long context dreams, and RAG seems,

To be the magic, or so it deems.

Outro:

In the algorithm's echo, we find our tale,

A satirical song of tech, in scale.

Where the future is a script, yet unwritten,

In the saga of AI, we are smitten.

Toong’s Substack

Discussion about this post