GLOSSARY

Key terms and background information on the exhibition THE WORLD THROUGH AI

Lorem ipsum

AI INFRASTRUCTURES
The vast material foundation that makes artificial intelligence possible: data centers humming with servers, undersea cables, electrical grids, cooling systems, and the mines supplying rare minerals for chips. Far from being “virtual” or weightless, AI depends on enormous physical resources—energy, water, land, and human labor. To speak of infrastructures is to remind ourselves that every generated image or text has a concrete, planetary cost.

AI SLOP
The flood of low-quality, mass-produced content generated by AI and scattered across the internet: stereotyped images, formulaic text, spammy videos. The result of the default parameters of the commercial AI models, “slop” is cheap to produce and overwhelming in volume. The term captures the growing anxiety that automated abundance is degrading our shared information environment, burying human expression beneath an undifferentiated sludge of synthetic filler.

ANALYTIC (DISCRIMINATIVE) AI
The branch of artificial intelligence that classifies, sorts, and recognizes rather than creates. It answers questions like “Is this a face?” or “Is this email spam?”—drawing boundaries between categories within existing data. For decades this was AI’s dominant mode, powering facial recognition, medical diagnostics, and surveillance systems. It is the analytic counterpart to generative AI: where one discriminates between things, the other produces new ones.

ARTIFICIAL NEURAL NETWORK
A computational model based on a simplified view of the organization of biological neurons in animal brains, which receive input data that is transformed and sent on to other neurons. Artificial neurons differ from biological neurons in many ways, particularly in how they are calibrated. An artificial neural network is said to be “deep” when it contains more than three layers of neurons. For example, convolutional neural networks* may contain dozens or even hundreds of layers of neurons.

BIOLOGICAL NEURAL NETWORK
A group of neurons that exchange information through electrochemical signals in a network of connections, capable of adapting their relationships to perform specific functions such as perception, memory, or movement.

CHATGPT
The conversational AI system released by OpenAI in November 2022, widely seen as the spark of the generative AI wave we are still living through. Reaching a hundred million users within two months, it brought generative AI out of research labs and into everyday life almost overnight. More than a product, “ChatGPT” became shorthand for a cultural and economic upheaval whose consequences are still unfolding.

CLIP (CONTRASTIVE LANGUAGE-IMAGE PRETRAINING)
Released in 2021 by OpenAI, CLIP jointly encodes natural language descriptions and images in a common latent space* to make it easier to search for, classify, and describe images.

CONVOLUTIONAL NEURAL NETWORK
Artificial neural network* composed of feature detectors (called “filters” or “kernels”) that scan an image to detect simple patterns (such as lines, textures, and color variations). This information is then combined by subsequent layers to infer the presence of increasingly elaborate features: corners, angles, textures, parts of objects, and, finally, the objects themselves.

DALL-E
Text-to-image generative AI model* released by OpenAI in 2021.

DATA CENTERS
The physical buildings where the computation behind AI actually happens: vast warehouses packed with servers, storing data and running the calculations that train and operate AI models. They consume enormous amounts of electricity and water for cooling, and their numbers are growing rapidly. Often located far from public view, data centers are the hidden engine rooms of the digital world: the concrete reality beneath the “cloud.”

DATA EMBEDDING
Transformation of data (for example, image or words) into vectors so that this data can be processed by an algorithm. Embedding may take place within a latent space.*

DEEP LEARNING
A subset of machine learning,* deep learning refers to techniques using so-called deep neural networks,* meaning they are composed of multiple successive layers that process information. This enables them to identify increasingly complex features within data. The success of deep learning methods applied to image recognition is what triggered the AI boom from 2012 onward.

EIGENFACE
A face recognition technique based on the abstraction of distinguishing facial features and the creation of a latent space* where images of faces are positioned according to their similarity. When presented with a new face, this enables the identification of the images closest to it, and potential recognition of it.

FACE AND EMOTION RECOGNITION
Analytic AI systems that identify individuals from their facial features, or claim to read inner states—happiness, anger, fear—from expressions. Widely deployed in surveillance, policing, advertising, and border control, they raise sharp concerns about privacy and consent. Emotion recognition is especially contested: many scientists doubt that feelings can be reliably inferred from faces at all, making the technology as questionable as it is powerful.

FOUNDATION MODEL
A large generative AI model,* pretrained on a broad spectrum of mainly unlabeled data, using self-supervised techniques. It is used as a common base for a wide range of specialized tasks, each of which may require fine-tuning.

GENERATIVE ADVERSARIAL NETWORK (GAN)

A generative AI model* introduced in 2014 whose training is based on the interaction between a “generator” neural network, which generates images made of random combinations of pixels, and a “discriminator” that evaluates whether these images resemble those in the training data, for example, a cat. Through the discriminator’s feedback, the generator learns by trial and error to output similar new images. GANs were widely used by artists during the second half of the 2010s.

GENERATIVE AI MODEL
A model is called “generative” when it is able to produce new data (images, text, etc.) after training on vast amounts of other data.

GENERATIVE PRETRAINED TRANSFORMER (GPT)
Family of language models developed by OpenAI from 2018 onward, which has popularized transformers* Pretrained on large datasets of unlabeled text, they can then be fine-learning-tuned to perform specialized tasks, such as translation or code generation. They make up the base of ChatGPT, released in November 2022.

HALLUCINATION
The term for when a generative AI system produces confident output that departs from reality. It invents facts, citations, or events in text, or, in images, conjures impossible details: six-fingered hands, garbled writing, objects that melt into one another. Because these systems generate what is statistically plausible rather than what is true or real, such errors are not glitches but structural features. The word itself is debated: critics argue it wrongly implies that the machine “perceives” or malfunctions, when it is only ever predicting likely patterns.

IMAGE GENERATOR
See image-to-image generative AI model* and text-to-image generative AI model.*

IMAGE-TO-IMAGE GENERATIVE AI MODEL

Generative AI model* that uses an input image or images to produce new images.

IMAGE-TO-TEXT GENERATIVE AI MODEL
See multimodal model.*

IMAGENET
Database created in 2009 at the instigation of researcher Fei-Fei Li, containing fourteen million images manually annotated by crowdworkers (through the crowdsourcing website Amazon Mechanical Turk), used to organize an annual competition (ImageNet Large Scale Visual Recognition Challenge, or ILSVRC) to compare the performances of different computer vision algorithms. In 2012, convolutional neural networks* took part in the competition for the first time, demonstrating their technical superiority and triggering a revolution in the field.

INTERPOLATION
Mathematical operation consisting of estimating missing points in the data as inferred by nearby data.

LAION-5B
Dataset created by the German nonprofit LAION, consisting of five billion images paired with their description: their “captions” come partially from “alt” tags provided by whoever put them online, in other words, micro-captions which accompany every image published on a website coded in HTML. (Tue tags are displayed when, for whatever reason, the image doesn’t appear.) True data for LAION-SB comes from Common Crawl, a web archive updated regularly since 2008.

LANGUAGE MODEL
A text-to-text generative AI model* that, during its training (which involves calculating the probability of a sequence of words appearing in a given context), has encoded enough data about linguistic structure and relationships (syntax, relationships between words, sentences, concepts) to be able to generate text (predict the next words), translate from one language to another, analyze opinions, transcribe text from audio data, summarize documents, and so on.

LARGE LANGUAGE MODEL (LLM)
A language model* whose number of parameters and the size of its training dataset* are vast enough for it to acquire the ability to provide coherent answers to complex or specialized questions, without having been specifically trained to do so.

LATENT DIFFUSION MODEL
Learning technique used by most text-to-image generative AI models* (Dall-E*, Stable Diffusion,* and Midjourney*) that involves training an algorithm by having it add noise (randomly selected pixels) to the training dataset (usually images), then predicting how much noise must be removed to recover the original data. This process allows the algorithm to integrate enough information about the images to be able to generate new ones.

LATENT SPACE
A multidimensional vector space in which digital objects (words, images, sounds) are positioned according to their similarities and differences. It is made up of combinations of distinguishing features retained by the algorithm when it learns to recognize objects within the training dataset.*

LORA (LOW-RANK ADAPTATION)
Low-rank adaptation, or LoRA, is a technique used for adjusting a “generic” generative AI model* (for example, Stable Diffusion*) to process new tasks or fields by modifying its parameters without having to retrain the entire model.

MACHINE LEARNING
The technique of configuring an algorithm on a set of data—called training datasets*—so that it “learns” their underlying relationships. The adjustment process (also called “learning,” “configuration,” or “calibration”) enables the algorithm to classify or predict new data.

MACHINE VISION / COMPUTER VISION
Field of computer science concerned with teaching computers to detect and interpret elements (objects, faces, gestures, situations) in images.

MATRICES
Charts of numbers arranged in rows and columns that can represent input data (for example, the intensity of each pixel in an image) or else the parameters of a neural network.* Processing data using a layer of neurons involves multiplying the matrix of input by the matrix of parameters (before applying a bias vector and an activation function).

METADATA
Information about data. For example, the data of a digital image are its pixels, and the metadata are the resolution, dimensions, file format, creation date, etc., of the image.

MIDJOURNEY
AI research lab known for its eponymous generative AI program that outputs images from natural language descriptions (or prompts*). Released in 2022, Midjourney quickly set a standard in the field, yet it has also sparked controversy (as other text-to-image models like Dall-E* or Stable Diffusion* have done): for using artists’ work without their consent, for making it easy to plagiarize, and even for its role in winning digital art and photography prizes.

MULTIMODAL MODEL
Generative AI models* that are able to pass from one modality (such as text, image, video, or audio) to another. For example, a multimodal model may produce a natural language description of an image (image-to-text) or else output an image based on a natural language description (text-to-image). AI has also revolutionized the field of moving images with text-to-video generative AI models* such as Runway Gen-3 or Sora, which can generate all sorts of videos based on prompts.*

NATURAL LANGUAGE PROCESSING (NLP)
Computer science field concerned with language, its aim being to create programs capable of carrying out textual or oral tasks: translation, question-answering, summarizing, text generation, etc.

NSFW
Abbreviation for “not safe for work,” referring to violent or pornographic content (particularly images).

PROMPT
A prompt is the natural language input which, when given to a generative AI model,* triggers and guides the content produced: a question, a paragraph to change, the description of an image, and so on. How precise the prompt is (examples, specifications, etc.) can cause considerable variations in both the type and quantity of output.

SCRAPING
The process of automatically extracting data from the internet.

SEQUENTIAL DATA
Data arranged in sequences where order matters. For example, the words in a sentence or the notes of a melody. Starting in the late 2010s, transformers* made it possible to process sequential data much more efficiently, leading to the development of large language models* such as GPT-2, GPT-3, and ChatGPT. In turn, this enabled the processing of prompts* in text-to-image generative AI models.*

SLOPAGANDA
A blend of “AI slop*” and “propaganda,” coined in 2025 for AI-generated content designed to manipulate beliefs and emotions for political ends. What sets it apart from older propaganda is scale, speed, and personalization: it can be produced cheaply in vast quantities and micro-targeted to individuals. Often it aims not at factual deception but at building emotional associations that stick through sheer repetition.

STABLE DIFFUSION

Generative AI model* released in 2022 that can generate artificial images using text prompts, one version of which is called “Stable Diffusion XL” or “SDXL.” The model was trained on the LAION-5B database and the LAION-Aesthetics V2 subset, whose code can be viewed and downloaded. Stable Diffusion (as well as Dall-E*) has sparked the same controversies as Midjourney.*

TEXT-TO-IMAGE GENERATIVE AI MODEL
See multimodal model.*

TEXT-TO-TEXT GENERATIVE AI MODEL
See language model* and large language model.*

TEXT-TO-VIDEO GENERATIVE AI MODEL
See multimodal model.*

TRAINING SET
A collection of data, sometimes manually labeled, that an algorithm learns to recognize or complete by trial and error. With each error, the algorithm adjusts its parameters so as to gradually encode enough information about the data (their features, similarities, and differences) to be able to classify or produce new ones.

TRANSFORMER
A deep learning* NLP* model proposed in 2017 by a team of Google engineers, whose main innovation is to give a central role to the “attention” mechanism, which consists of weighting the relative importance of each word in relationship to others. This parallel (rather than sequential) treatment requires less computing and has made it possible to multiply the size of training sets* and the number of parameters, contributing to the emergence of large language models.*

UPSCALING
Refers to the process of using AI to increase the resolution of an image or video, usually by adding pixels.

VARIATIONAL AUTOENCODER
An artificial neural network* invented in 2013 by Diederik P. Kingma and Max Welling. It is designed to learn a probabilistic distribution of training sets* in a structured latent space* in order to generate similar data, or even new data, located between the categories of existing data.

VECTORS
The basic way AI systems represent the world: as long lists of numbers. A word, an image, a sound—each is translated into a string of coordinates that fixes its position in a vast mathematical space. Things with similar meanings end up close together. This translation of meaning into numbers is what lets machines “compute” language and images, but it also reduces every nuance to measurable distance.

WEB CRAWLING
An automated web browsing process used to collect information such as website content and links.

WORLD MODEL
AI systems that don’t just generate a static image or text, but simulate an environment and predict what happens next as a user acts within it. Trained on vast amounts of video, they can conjure explorable, physically plausible worlds from a single prompt. Promoted as tools for training robots and self-driving cars, they extend generative AI from making pictures toward modeling reality itself.


Alban Leveau-Vallier and Antonio Somaini
Partially translated from the French by Michelle Noteboom
Translated into German by Stefan Barmann