Glossary

Published

February 1, 2025

Note

This glossary is a work in progress.

Inference

Pretraining

Supervised finetuning

Reward Modeling

Reinforcement Learning

Reinforcement Learning from Human Feedback ChatGPT is a RLHF model

Reinforcement Fine-Tuning

Apply to the Reinforcement Fine-Tuning Research Program

Transformer architecture A deep learning model introduced in the 2017 paper “Attention Is All You Need” by Vaswani et al. 

Recurrent Neural Network (RNN)

Supervised fine-tuning (SFT)

Distillation

Mixture of Experts MoE Rather than having a model that tries to do everything — having a single model in memory means you have to have a big memory which is very expensive. Inference goes through this giant model for each work. Inefficient. In MoE the early stages of inference, called the gating network, will route the input to different parts of the model which are specialized in different thing, meaning you don’t have to keep the whole model in memory all the time. The “experts” are not predefined, they emerge during training.

Inference Inference is just generating output to a certain input.

NLP model Natural Language Processing model. This is a broad category, that includes LLMs, but also stuff like specialist translation models or Text classification models (e.g., sentiment analysis) that just focus on one thing.

LLM A Large Language Model. A specific type of NLP model that is large, deep-learning-based, and trained on massive amounts of text data.

Distillation Ask a big LLM a bunch of questions in a particular field, and use that to teach a smaller specialized model. You can also do this to distill a big LLM broadly, and that will remove the wasteful stuff and make a more efficient smaller model.

Chain of thought The LLM first thinks about a process for solving a problem, and then goes through the process. Like the difference between System 1 and System 2 thinking as defined by Daniel Kahneman. One way to train a system like this is to give it a lot of problems, and then just reward it depending on whether it was right or wrong, and whether it has written reasoning in the correct format. This method is good because there are lots of data sets with questions and answers, but few with question, reasoning, and answers. This process doesn’t need the reasoning in the training data set.

LLM as a judge Asking an LLM to judge is the output of an LLM is good or not. Weird approach? But might work?

Retrieval Augmented Generation — RAG A fancy acronym for a very simple trick — if you have a question, first you search your documentation for sections that might be relevant to that question, and you copy all of that stuff in with your question.

Getting RAG working really well is actually very difficult — because, how do we pick the information that is relevant to what the user is asking? People ask in weird ways.

Take an existing model and refine it for a particular problem. Most an expensive waste of time, according to S Willison.

For example, specifically fine-tuning a model for your own documentation just doesn’t work — fine-tuned models can actually hallucinate more than those that haven’t been fine-tuned. Fine-tuning to “add new facts into the model” just doesn’t work, which seems counter intuitive.

Fine-tuning can work for certain specific tasks, for example making a model that is really good at SQL.