Generative AI has been much in the news both in RPG-land and in the general media. I do a fair amount of work in this area, so I thought it might be helpful to give a worked example of how generative AI works. For text, the AI repeatedly tries to guess the next word in the sequence and keeps going until the special "|end|' token is its best guess, which means that it is done.
It has no idea of truth, falsehood, safety, appropriateness or tone. It just wants to give you a plausible next word; any qualities of that next word are determined by how it was trained and the previous input. This is why, if you start being aggressive with an AI, or give it weird prompts, it gives aggressive or weird answers -- because that output fits the input you have given it best.
However: Commercial AI's have a "secret sauce" filter on top of the basic operation which modifies this basic algorithm to make it more sane. I'm not covering that layer in this post.
So, let's ask an AI:
What is the next item in this sequence: The Sex Pistols, Rolling Stones, Kate Bush?
You and I might look at that list and go "huh, British Pop/Rock artists" and that would generate a list of possibilities and we'd select one as an answer. This is not how GenAI works even slightly. Instead it applies its (currently 7-70 billion or so) parameters to the words "What is the next item in this sequence: The Sex Pistols, Rolling Stones, Kate Bush?" and comes up with a set of possibilities for the next word. Actually, not really a word, but a token, which can be part of a word, as we will see.
Things to note:
- That is the complete description of what it does: Choose a plausible next token from a sequence of input tokens
- From the set of possible next tokens, it chooses one at random with probabilities proportional to how likely it thinks each token is.
- You can control how random that choice is with the temperature parameter. At zero, it will always choose the most likely answer (and so become mostly deterministic). At values over 1 it gets a little wild ...
- For each output token, it has to run a repeated complex model involving 70 billion parameters. This is not cheap.
-------
So, onto our worked example:
What is the next item in this sequence: The Sex Pistols, Rolling Stones, Kate Bush?
I'll set the temperature to 1.2 to get some extreme suggestions and ask for 5 different responses. For each response, I show the token chosen and its probability together with the top 5 alternatives that were given. Note that the possible responses all start with the same basic probabilities:
David (27.11%) • The (19.12%) • Mad (10.09%) • No (9.73%) • Led (5.01%). That's because they all start with the same token sequence. Once the first token is chosen, then they diverge.
Response 1: Not enough information given.
Not (2.29%) -- David (27.11%) • The (19.12%) • Mad (10.09%) • No (9.73%) • Led (5.01%)
enough (16.03%) -- possible (44.07%) • a (22.79%) • enough (16.03%) • able (2.26%) • sure (1.65%)
information (88.32%) -- information (88.32%) • context (6.28%) • data (3.24%) • info (1.63%) • details (0.18%)
given (80.25%) -- given (80.25%) • provided (17.45%) • to (1.12%) • available (0.85%) • for (0.21%)
. (93.24%) -- . (93.24%) • <|end|> (6.30%) • to (0.16%) • yet (0.07%) • here (0.06%)
Response 2: Led Zeppelin
Led (5.01%) -- David (27.11%) • The (19.12%) • Mad (10.09%) • No (9.73%) • Led (5.01%)
Ze (99.99%) -- Ze (99.99%) • Z (0.01%) • ze (0.00%) • Ze (0.00%) • -Z (0.00%)
ppelin (100.00%) -- ppelin (100.00%) • pp (0.00%) • pl (0.00%) • ep (0.00%) • ppe (0.00%)
Response 3: The Beatles
The (19.12%) -- David (27.11%) • The (19.12%) • Mad (10.09%) • No (9.73%) • Led (5.01%)
Beatles (61.99%) -- Beatles (61.99%) • Clash (24.43%) • Smith (3.23%) • Cure (3.03%) • Who (2.50%)
Response 4: The Velvet Underground
The (19.12%) -- David (27.11%) • The (19.12%) • Mad (10.09%) • No (9.73%) • Led (5.01%)
Velvet (2.11%) -- Beatles (61.99%) • Clash (24.43%) • Smith (3.23%) • Cure (3.03%) • Who (2.50%)
Underground (99.99%) -- Underground (99.99%) • Under (0.01%) • underground (0.00%) • Und (0.00%) • (0.00%)
Response 5: David Bowie
David (27.11%) -- David (27.11%) • The (19.12%) • Mad (10.09%) • No (9.73%) • Led (5.01%)
Bowie (100.00%) -- Bowie (100.00%) • Byrne (0.00%) • Bow (0.00%) • bow (0.00%) • (0.00%)
Response 5 is actually the most likely -- 'David' has a 27% chance of being chosen (the highest) and then it's almost certain we'll go with "Bowie" rather than "Byrne" or others.
One other thing to note is that when the AI chooses "The" as the next token, it has no idea at that point what comes next. It's only AFTER we've added 'The' to our token sequence, making the new input sequence "What is the next item in this sequence: The Sex Pistols, Rolling Stones, Kate Bush? The" that it comes up with "Beatles (61.99%) • Clash (24.43%) • Smith (3.23%) • Cure (3.03%) • Who (2.50%)" and (for response 3) chooses "Beatles" or (for response 4) chooses "Velvet" -- and that last one has a really low probability. If we lowered the temperature, we'd be hugely unlikely to see that chosen.
---------
So, not necessarily RPG-focused, but I hope this post helps understand a little about how this new tech works, how you can use it, and why it does weird things. If you want to post in this thread, please keep it focused on the technology rather than the ethical usage of AI. That is a hugely important discussion, but I'm hoping to keep this thread more focused on how it works, rather than how we should manage it.