A.I. Glossary

A glossary of common AI terms for non-AI professionals.

A.I. Glossary
Produced with Stable Diffusion and the single-word prompt, "ai".

AI news is all over the place these days, and it can be difficult to cut through the hype, especially if you don't work in AI or data science. However, more and more people need to know at least the basics, so they can make important decisions for their own work, or on behalf of their business.

With that in mind, I've put together this glossary of terms that you might see or hear in the media, on tech company websites, in vendor sales calls, and in industry white papers. It's geared towards non-AI professionals — people who aren't building AI applications but who may need to use or evaluate AI applications.

I hope you find it useful. And I intend it to be a living document, so please email me if you have questions, suggestions, corrections, or other terms you think I should add!

Note: terms are in alphabetical order, so feel free to jump around to find what you need. Also, when a term in a definition is also included in this glossary, I put it in italics, so you know to check that one out, too.

Abstractive summarization — When a model is used to summarize a document, it can either find one or more quotes that contain the core meaning of the document, or it can paraphrase the document. When the model summarizes the document by paraphrasing, that is called abstractive summarization (contrast with extractive summarization).

Algorithm — A set of precise mathematical instructions for a computer to follow. In social media, we're usually talking about content recommendation algorithms, which use information about us to suggest relevant content or ads. In AI, we're usually talking about the processes by which models are trained to help us perform specific kinds of tasks.

API — This stands for application programming interface. It's basically the language and protocol computers use to talk to each other. APIs come into play for AI when the models are so big that they have to live on a server, rather than directly on a user's device. Most apps being built around things like ChatGPT, DALL-E, or Midjourney use APIs to connect the app you're using with the core model.

Artificial intelligence (AI) — AI is an imprecise term that means different things to different people. Typically, it refers to the use of machine-learning-based tools in applications that humans interact with directly, like a chatbot. However, it is also sometimes just a branding term that makes a product that processes data computationally sound fancy and futuristic. So keep your hype detectors on!

Chatbot — An app that accepts a user prompt in the form of natural language and returns a response based on that prompt, in linke with some underlying data model. In contrast to a more straightforward question-and-answer application, a chatbot typically "remembers" earlier prompts and responses, taking at least some conversational history into account when generating new responses.

Checkpoint — When training a model, certain processes are repeated many times along the way, until a satifsactory (enough) result has been obtained. Every step along the way is a checkpoint. Sometimes a pretrained model is referred to as a checkpoint, because someone can take it as a starting point and pick up the training process via fine-tuning. Sometimes fine-tuned models are also referred to as checkpoints, reflecting the open-ended nature of the training process, and inviting users to continue fine-tuning and customizing the model to suit their own needs. (If you use the image generation software Stable Diffusion, you will see such fine-tuned versions of standard models referred to as checkpoints and available for download.) (See also embedding.)

Classification — Comparing an object to a model in order to provide a label for that object, based on one of the classes/categories the model learned in training. A common example from computer vision involves the comparing of an image to an image model to identify objects, text, or even logos present in the image. When you click on images of bicycles, crosswalks, or traffic lights to login to a website, you are likely helping a model for self-driving cars get better at identifying those objects in images fed to the model by the car's camera.

Computer Vision — A field of machine learning that deals with the analysis of images, including object detection and classification.

CPU — The central processing unit of a computer. This is place where the main compuational action happens in a computer, separate from the memory and hard drive, where the information is stored short- or long-term. Note that people sometimes refer to the non-monitor part of a desktop computer (what we used to call the "tower") as "the CPU." While usually a fine abbreviation, it can be confusing in AI, because the tower/box can contain both a CPU and a GPU (graphics processing unit), especially if the machine is designed for gaming or media editing. In AI, this distinction is important. CPUs can perform one mathematical operation at a time, even if they can do millions of them per second. GPUs, however, can perform multiple complex operations simultaneously, and therefore complex AI tasks like training or fine-tuning a model will be performed on GPUs rather than CPUs, saving minutes, hours, even days or weeks of processing time. Because not every computer comes with a dedicated GPU (especially laptops and Macs), this limits the AI or machine learning tasks that can be done on a computer with a CPU only.

Decoding — Because humans understand text, images, audio, video, etc., but models understand numbers, data must be converted to numbers when input into a model and converted back into human-readable data when output by the model. The process of converting numerical representations of data into text, images, audio, video, or other forms of data that humans can understand is called decoding. (See also encoding.)

Diffusion — Diffusion models are probabalistic models that learn how to create things like images and videos by deconstructing existing images and videos through the addition of "noise" to each image (think static on an old TV), and then remaking images of the same kinds of objects by reversing the process. Research in the past few years has shown diffusion-based models to be significantly more accurate and efficient at generating believable images than other kinds of models. As a result, the tools making the news today for their uncanny and lifelike image-creation abilities are all based on diffusion models.

Embedding — Essentially a "plugin" for a model that adds a single new object/class/concept to an existing model. When you perform such single-entity fine-tuning, the result is a small file containing only the mathematical representation of the newly trained object — in contrast to a checkpoint, which is a large file (or set of files) that fully replaces the preexisting model. Because you embed this new object/class/concept into an existing model, rather than replacing the whole model, it is called an embedding.

Encoding — Because humans understand text, images, audio, video, etc., but models understand numbers, data must be converted into numbers when input into a model and converted back into human-readable data when output by the model. The process of converting input data (text, images, audio, video) into numbers that the algorithm can understand — for example, for the purpose of training or fine-tuning a model — is called encoding.

Extractive summarization — When a model is used to summarize a text document, it can either find one or more quotes that contain the core meaning of the document, or it can paraphrase the document. When the model identifies quotes that summarize the document, that is called extractive summarization (contrast with abstractive summarization).

Fine-tuning — The process of creating a custom model by starting with a general model (also called a pre-trained model) and using it as the starting point for additional training on custom data. Creating a custom model from scratch can — in the case of large language models (LLMs) — take several months and hundreds of thousands, or even millions, of dollars. However, when a general model already exists, fine-tuning it with additional data tailored to your purposes can create a highly effective, customized model with very few resources (and significantly less environmental impact).

Generative AIArtificial intelligence applications that can be used to generate new content, typically based on a simple prompt provided by a human user. This can involve the creation of wholly new images, videos, songs, blocks of text, etc. (like Midjourney or Stable Diffusion). It can also involve completing that image, video, song, block of text, etc. based on a beginning provided by a human user (like text message autocompletion).

GPU — A graphics processing unit (see CPU above). GPUs make it possible for some desktop computers to handle moderately complex machine-learning tasks, like training and fine-tuning models. Large, complex operations will often be performed on GPU clusters — computers containing networks of GPUs linked together to perform these complex operations as a single unit.

Inference — In short, inference is the term for using, or applying, AI models. Once a model is trained, it is used for making predictions, classifying objects, even generating art. All of that is called inference.

Large Language Model (LLM)Models from the field of natural language processing (NLP) that are trained on very large datasets (like the text of whole libraries or millions of websites) using complex deep learning algorithms. There isn't a hard-and-fast boundary between "language models" and "large language models", so it's easiest (especially for non-machine-learning-engineers) to focus on the applications they enable. Where many kinds of language models can be used to extract entities (names, locations, etc.) from documents, summarize documents, and find answers to questions, LLMs are particularly good at tasks that require zero-shot predictions — essentially fast, lightweight solutions to NLP problems without fine-tuning or custom training. This zero-shot capability makes LLMs far superior when it comes to things like chatbots (like ChatGPT), translating text instructions into computer code, etc. (See zero-shot learning, below, for more information.) LLMs also don't require supervised learning in the same way or to the same extent as many other language models, which allows LLMs to train on a much larger dataset than their NLP siblings.

Machine learning — The process of applying an algorithm (the machine) to a dataset in order to uncover previously unknown patterns in the data (the learning). These patterns become the basis of a model, and that model is used to make predictions or classifications about new data.

Model — Any mathematical representation of data is a model. In AI and machine learning, models typically refer to representations of large general things like languages and bodies of knowledge. (See also embeddings and checkpoints.)

Natural Language Processing (NLP) — A sub-field of data science that uses machine learning to analyze and "understand" human language. It can involve relatively simple processes like entity extraction (finding all the names or locations in a document) or more complicated processes like summarizing large documents or translating them into other human languages (like English to Spanish) or other modes of language (informal to formal, text messages to emails, etc.). Large Language Models (LLMs) are examples of NLP.

Pretrained model — A general model that is used as the basis of fine-tuning is  referred to as a pretrained model. In some cases, models are only pretrained — that is, they form a solid basis for other models, but require fine-tuning to perform well enough to use in an application. However, any model that is used as the basis for a fine-tuned model is considered "pretrained," even if they can stand on their own two feet.

Script — A unit of code contained in a single text file. An application is typically made up of many scripts. These days, many — but certainly not all — machine learning scripts are written in the Python programming language.

Supervised learning — A machine learning process where both inputs and outputs are used to generate the model. For example, a set of questions paired with their respective answers can be used to train a question-answering model to understand patterns that help it answer new questions not in the training dataset. (Contrast with unsupervised learning.)

Tokens — Words or parts of words that are treated as individual units by a model. Tokens and tokenization are primarily important for data scientists and engineers creating models, pipelines, and applications. However, APIs like Open AI's charge users for that API usage by the token, meaning that all users need to track (and minimize, where possible) their token usage.

TPU — A tensor processing unit is a special kind of processor developed by Google. Unlike the CPU and GPU, it was designed with specific kinds of machine-learning tasks in mind.

Training — The process by which an algorithm processes a dataset in order to identify patterns. The patterns identified make up the resulting model. The dataset used by the algorithm to uncover patterns is the training set or the training data. This identification of patterns is the core element of machine learning (because the machine is learning the patterns).

Transfer learning — Applying insights gained from one machine learning process in a different domain. It can include simply the application of knowledge in a new domain, or it can involve building on the preexisting knowledge to gain new insights (in other words, fine-tuning a pretrained model for a different or more specific purpose).

Translation — In natural language processing (NLP), translation involves taking the underlying meaning of a text and producing a new text with the same meaning. This can involve the common notion of translation — from one human language (like English) into another (like Spanish). But it can also involve the re-presentation of a meaning in a new mode of language — for example, converting an informal set of text messages into a more formal email, converting an academic article full of technical jargon into a blog post for a general audience, or even converting a draft with grammatical and spelling errors into a final version free of such errors.

Unsupervised learning — A machine learning process where inputs are used to generate a model, but unlike in supervised learning, there are no outputs (or answers) to learn from. For example, the grouping of Netflix users into groups of similar movie watchers based on their watch history involves unsupervised learning. In such a case, there is not a predetermined list of categories to sort users into, nor any right answers provided, but the categories are emergent from the data. Put another way, the users sort themselves into groups where members are highly similar to each other but different from members of other groups.

Zero-shot learning — This is a pretty technical concept, but it basically means getting a machine learning model to do something it wasn't explicitly trained to do — for example, if an image classifier is trained to recognized different makes of car and only given American cars in training, but is able to recognize Hondas or VWs as makes of car. It may sound obscure, but it's a key technique enabling many of the most recent advances in generative AI, so you'll probably run into the term at some point.