What are Embeddings?

How AI converts words, images, and ideas into numbers that capture meaning. The mathematical foundation that makes AI understand similarity.

6 min read

Computers are great with numbers. Terrible with meaning.

The word "dog" doesn't mean anything to a computer. It's just four letters. But somehow AI can understand that "dog" is more similar to "puppy" than to "airplane."

Embeddings are how AI turns meaning into math.

The core idea

An embedding converts something (a word, sentence, image, song) into a list of numbers called a vector. These numbers capture the "meaning" of the original thing.

Here's the magic: similar things get similar numbers.

┌─────────────────────────────────────────────────────────────┐ │ │ │ WORDS EMBEDDINGS (simplified to 3 numbers) │ │ ━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │ │ │ │ "dog" ──► [0.8, 0.1, 0.3] │ │ "puppy" ──► [0.9, 0.2, 0.4] ← Very similar! │ │ "cat" ──► [0.7, 0.3, 0.2] ← Somewhat similar │ │ "airplane"──► [0.1, 0.9, 0.8] ← Very different │ │ │ └─────────────────────────────────────────────────────────────┘

The computer can now measure similarity by comparing these number lists. "Dog" and "puppy" have similar numbers, so they're similar concepts.

How embeddings are created

Creating good embeddings is like teaching AI the relationships between things by showing it millions of examples.

For word embeddings, you might train on text like:

"The dog barked loudly"
"A puppy played in the yard"
"She walked her dog"
"The cute puppy wagged its tail"

The AI notices that "dog" and "puppy" appear in similar contexts. They both get walked, they both wag tails, they both play. So their embeddings end up similar.

For image embeddings, you show the AI millions of pictures. It learns that photos of dogs often contain similar shapes, colors, and patterns, so their embeddings cluster together.

The process is called "representation learning" because the AI learns to represent concepts as vectors.

Dimensionality: More numbers, more nuance

Real embeddings aren't just 3 numbers. They're usually hundreds or thousands of numbers.

Word2Vec: 100-300 dimensions
OpenAI's text embeddings: 1,536 dimensions
Image embeddings: Often 512-2048 dimensions

More dimensions = more subtle relationships. With only 3 numbers, you might capture "dog vs. airplane." With 1,000 numbers, you can capture "German Shepherd vs. Golden Retriever vs. Chihuahua."

Simple embedding (3D):

Dimension 1: Animal vs. Object
Dimension 2: Size
Dimension 3: Domestication

Complex embedding (1536D):

Dimension 47: Fluffiness level
Dimension 234: Historical significance
Dimension 891: Emotional association
Dimension 1204: Seasonal relevance ... and 1,532 other subtle aspects of meaning

Types of embeddings

Word Embeddings

Each word gets its own vector. "King" might be close to "queen," "monarch," and "royal."

Sentence Embeddings

Entire sentences become vectors. "I love pizza" and "Pizza is delicious" would have similar embeddings.

Image Embeddings

Pictures become vectors. Photos of beaches cluster together, photos of mountains cluster together.

Multimodal Embeddings

The same vector space includes both text and images. The word "dog" and a photo of a dog end up near each other.

The geometric magic

Here's where embeddings get really cool. Relationships between concepts become geometric relationships between vectors.

The famous example: king - man + woman = queen

In embedding space, you can literally do math with concepts:

Take the vector for "king"
Subtract the vector for "man"
Add the vector for "woman"
The result is very close to the vector for "queen"

┌─────────────────────────────────────────────────────────────┐ │ │ │ EMBEDDING SPACE (2D simplified) │ │ │ │ queen • • king │ │ \ / │ │ \ / │ │ \ / │ │ woman • • man │ │ │ │ The "royalty" direction runs horizontally │ │ The "gender" direction runs vertically │ │ │ └─────────────────────────────────────────────────────────────┘

How AI systems use embeddings

Search and Retrieval

When you search for "dog care tips," the system converts your query to an embedding and finds documents with similar embeddings. This is how RAG systemsRAG (Retrieval-Augmented Generation)A technique where AI retrieves relevant documents before generating a response, reducing hallucinations.Click to learn more → work.

Recommendation Systems

Netflix converts movies to embeddings. If you like movies with similar embeddings, it recommends others in the same region of embedding space.

Clustering and Classification

Group similar items together by clustering their embeddings. Find outliers by looking for embeddings far from others.

Semantic Understanding

AI can understand that "car" and "automobile" mean the same thing because their embeddings are nearly identical.

Real-world applications

Search engines: Find pages that match the meaning of your query, not just keywords.

Customer support: Route support tickets to the right team based on semantic similarity.

Content recommendation: Suggest articles, products, or media based on embedding similarity.

Translation: Words with similar meanings in different languages have similar embeddings.

Fraud detection: Unusual transaction patterns show up as embedding outliers.

E-commerce search without embeddings: You search for "comfortable shoes for running." System finds products containing those exact words. Misses great running sneakers described as "athletic footwear with cushioned sole."

E-commerce search with embeddings: Your query embedding matches products about athletic footwear, sports shoes, jogging sneakers, and cushioned running gear, even if they use different words.

The limitations

Black box: You can't easily interpret what each dimension means. Why does dimension 247 have the value 0.8394? Nobody knows.

Bias: Embeddings inherit biases from training data. If your training data associates "doctor" with "man," the embeddings will too.

Context collapse: A single embedding can't capture all possible meanings of a word. "Bank" (river) vs. "bank" (money) might get confused.

Computational cost: Generating embeddings requires significant compute, especially for large models.

Quality matters

Not all embeddings are equal. Better embeddings capture more nuanced relationships and work better for downstream tasks.

What makes embeddings good?

Training data quality: Diverse, representative, high-quality training data
Model architecture: More sophisticated models capture more complex relationships
Training objectives: How the model is taught to create embeddings matters
Dimensionality: More dimensions (usually) capture more nuance

The future of embeddings

Embeddings keep getting better:

Multimodal: Combining text, images, audio, and video in one embedding space
Dynamic: Embeddings that change based on context
Specialized: Domain-specific embeddings for medicine, law, etc.
Efficient: Smaller embeddings that capture the same meaning with fewer numbers

Why embeddings matter: They're the bridge between human concepts and computer math. Every time AI understands similarity, finds relevant content, or makes semantic connections, embeddings are working behind the scenes.

The bottom line: Embeddings turn the messy, ambiguous world of human language and concepts into clean mathematical relationships that computers can work with. They're the foundation that makes semantic search, recommendations, and AI understanding possible.

Embeddings capture meaning in vectors. But modern AI needs more than just word-by-word understanding. Next: What are Transformers?, the architecture that revolutionized how AI processes sequences.

Get new explanations in your inbox

Every Tuesday and Friday. No spam, just AI clarity.