Neural Networks Hate Text
An introduction to word embeddings, one of the fundamental ideas behind generative AI models.
The Internet is mainly text.
For centuries, we’ve captured most of our knowledge using words, but there’s one problem:
Neural networks hate text.
Judging by how good language models are today, this might not be obvious, but turning words into numbers is more complex than you think.
The most straightforward approach is to use consecutive values to represent each word in our vocabulary. Here is one example:
Unfortunately, neural networks tend to see what’s not there. Is a Princess four times as important as a King? Of course not, but how does the network know?
Instead of using numerical values, we can use vectors. We call this particular representation “one-hot encoding,” where we use ones and zeros to differentiate each word:
This encoding fixes the problem of a network misinterpreting ordinal values, but according to the Oxford English Dictionary, there are 171,476 words in use. We certainly don’t want to deal with large vectors with mostly zeroes. There must be a better way.
Here is where the idea of “word embeddings” enters the picture.
We know that the words King and Queen are related, just like Prince and Princess are. Word embeddings have a simple characteristic: related words should be close to each other, while words with different meanings should lie far away.
Can we use this property to create a better representation to encode our 4-word vocabulary?
Here is a two-dimensional chart where I placed the words from our vocabulary. I created this by hand, but an actual application would use a neural network to find the best representation:
Something critical becomes apparent: King and Queen are close to each other, just like the words Prince and Princess are. This encoding captures a crucial characteristic of our language: related concepts stay together!
And this is just the beginning.
Notice what happens when we move on the horizontal axis from left to right: we go from masculine (King and Prince) to feminine (Queen and Princess). Our embedding encodes the concept of “gender”!
And if we move on the vertical axis, we go from a Prince to a King and from a Princess to a Queen. Our embedding also encodes the concept of “age”!
We can derive the new vectors from the coordinates of our chart:
The first component represents the concept of “age”: King and Queen have a value of 3, indicating they are older than Prince and Princess with a value of 1. The second component represents the concept of “gender”: King and Prince have a value of 1, indicating male, while Queen and Princess have a value of 2, indicating female.
I used two dimensions for this example because we only have four words, but using more would allow us to represent other practical concepts besides gender and age. For instance, GPT3 uses 12,288 dimensions to encode their vocabulary.
Embeddings are the backbone of some of the most impressive generative AI models we use today.
Certainly, one of those ideas that have changed the field.
I’m launching a cohort. Sort of.
I want to add $50,000 to your salary.
The cohort will be 9 hours of live content where I’ll teach you how to train, tune, deploy, and monitor machine learning models using AWS.
I’ll run the cohort every month starting in April. You can join here.
But the cohort is just the way this starts.
I’m turning this into a community where you pay once for lifetime access. Once you join, you get access to every class, course, talk, and benefit from the community until the end of time.
Let me say that again: no recurrent payments. Ever.
The next cohort that I’m bringing to the community is Time-Series Analysis. I’ll announce it as soon as I have firm dates.
The price to join is 43% off until the end of March.
Awesome info. Also GPT using 12,288, that’s impressive init?