ProjectPerko: language

Showing posts with label language. Show all posts

Tuesday, August 26, 2008

Natural Language Barrier

_{This is going to get ugly...}

I was thinking about social AI. Social characters that can comment on things in meaningful ways without being carefully scripted. Let me take you on a tour of my thought processes.

Okay, pretend you're Batman. What is your opinion of... bank robbers?

...

If you bothered to stop and answer the question, you probably came up with something like "I hate 'em. I'm gonna catch 'em and lock 'em up!"

Now, pretend you're Mario. What is your opinion of... bank robbers?

...

A little fuzzier? They're... bad people... not quite as crisp an opinion as Batman has.

Now, pretend you're frogger. What is your opinion of... bank robbers?

...

Frogger doesn't have an opinion of bank robbers. Frogger has opinions about trucks and logs and jumping from lily pad to lily pad. The only possible opinion he could have about bank robbers is that he doesn't like them because they drive too fast.

The pong paddle doesn't even have opinions on that level of cultural depth. The pong paddle's opinions will be so simplistic as to not even really count as opinions.

Well, obviously, frogger and Mario and Batman don't actually have opinions. They aren't capable of judging things for themselves: they don't have any kind of algorithm to let them. But we can imagine what they would think if they did, and the basic idea holds up.

You can only have opinions on things that you know about.

For most characters, the things they can conceivably learn about on their own are extremely limited: they can hate goblins, maybe, or have an opinion on a sword or whether a girl is pretty. In the case of a chatbot, it would be even more severely limited, because they don't even have a world to fall back on.

Even young players bring in an expectation of depth. Even if we're interacting with Bugs Bunny, we're expecting Bugs to have a life. A history, a future beyond this two-minute sequence with Elmer Fudd. We expect him to have a life, with all the complexities, changes, judgments, and opinions that entails. That's what makes him worth knowing.

So, if you try to build a social AI such as a chatbot or an RPG NPC, one of the things people will routinely do is to try to program those in. This character... doesn't like bank robbers. Okay. Now, if the question comes up, the character will say, "I hate bank robbers!"

...

"Why?"

Oops. No answer. We broke it.

Adding information in this way is the shallowest, most brittle method of adding information. The illusion of depth we gain is painfully bad. It only works when we carefully restrict the player's topics of conversation. Everyone's familiar with this: you can't ask why unless the programmer has added that option to the menu. Otherwise, you're stuck with asking about, say, goblins. Or where the magic sword is.

Chatbots tend to be extremely clumsy because this brittleness is faced head on, along with the natural clumsiness of trying to interpret player input, which has the same basic issues. Fundamentally, we square the brittleness.

I don't think this is a very good way to do things. We seem to be climbing the steep side of the mountain. Every inch costs us another hour of painful scrabbling.

Is there a way around?

...

Okay, the problem, at it's core, is that you can only have an opinion on things you know about. And the world we build is pretty limited: we only include the things we want the player to be able to interact with. This means that our NPCs will never experience building a house, getting married, dying, electing a new president, learning a secret... not unless we specifically script them in, creating a very fragile, shallow experience. They can have opinions on how to fight orcs, what's the best way to level up, and so forth... but those opinions aren't terribly interesting...

The solution would seem to be to get our NPCs to experience many things that are outside the scope of the game. If we want them to have actual opinions, we have to give them the experiences. Then they will be able to comment on them in depth.

However, creating a world where all of these experiences are algorithmically emergent would be... well, I'm not sure there's a word for it. "So close to impossible you might as well spend your time looking for wild polar bears in Florida."

The world is an extremely complex place, and the computation required to simulate something even 1/100,000,000 as complex is prohibitive.

The only real option is to glean experiences from the real world somehow.

How about we use a handy-dandy filter? One that's very common?

How about we use you?

What if we built a chatbot that learns what things are via stories? This isn't a chatbot that asks you how you are, or tells you that it's a chatbot. This is a chatbot that listens to you tell stories... and maybe tells stories back.

Humans tell stories to children all the time. Not simply for entertainment, but to teach. "Don't touch that, it'll burn you!" is a story. "Your face will stick like that!" is a story. So is Sleeping Beauty.

The chatbot has to come pre-equipped with a lot of basic tools for learning and interpreting. In order to be deep enough, the chatbot has to have an "experience" fairly similar to a human. That means that the chatbot has to be able to learn things like "thunder is loud", "lemons are sour", and "he felt sad".

Let's take a look at how it might work.

YOU: "There was a little boy..."
CB: (There: preprogrammed element to assign focus)
(Was: past tense be [complexity hidden])
(a little boy: unknown element: LITTLE BOY)
(LITTLE BOY is noun...)
CB: "Tell me about the little boy!"

YOU: "He was fast and clever."
CB: (He: re/little boy, now defined as MALE)
(Was: past tense be [complexity hidden])
(fast and clever)
(fast: basic adjective, preprogrammed)
(clever: unknown adjective)
CB: "Tell me how he was clever!"

YOU: "Well, he once outsmarted the goblin king..."
CB: (Well: response fluff [complexity hidden])
(he: re/little boy)
(once: time marker, preprogrammed)
(outsmarted: unknown verb)
(the goblin king: unknown element: GOBLIN KING)
(Plan to say: "tell me how he outsmarted the goblin king!")
(Compare to original, apply modifier only)
CB: "How?"

etc, etc.

There is a lot of hand-waving here, obviously. We're presuming a pretty advanced parser with a strong understanding of basic linguistics combined with a strong ability to link things to a human-like experience.

To give you an example of the complexity, let's look back on our Batman/Bank Robber example in a new light.

"I hate 'em. I'm gonna catch 'em and lock 'em up!" is what I said. But Batman would be more likely to say something like, "If you rob a bank, I'll make sure you end up rotting in prison."

These both say "the same thing", because we're used to thinking about the "logical content" of a phrase. To a programmer, used to computers, that's all that matters.

But in truth, there is a world of complexity between the two sayings. Let me show you.

"I hate 'em" is a value judgment that isn't even brought up in the second one. Even though we specifically ask what Batman thinks of bank robbers, the more canon Batman doesn't say "I think X".

This is because canon Batman tries to keep his emotions out of it. To him, it hardly matters whether he personally likes or hates bank robbers. They are objectively criminals, so his opinion is pointless.

This is a subtle point, and it's easy to wave it away as projection or overanalysis. Except that these subtle differences are the meat. The logical content of the phrases is almost unimportant - our valuation is what matters. In this case, off-the-cuff Batman is saying "I think bank robbers are bad" and canon Batman is saying "bank robbers are bad" by simply taking it for granted that his opinion isn't even worth mentioning.

There are other subtleties. Here's another: canon Batman says "If you rob a bank, I...", while off-the-cuff Batman jumps straight into the "I..."

Canon Batman once again shows a different mindset. We asked what he thinks about bank robbers. He redirects the question to be about bank robbery. He's not talking about the people that rob banks. He's talking about the act of robbing a bank, which happens to be attached to a person.

Again, he's not judging the person, he's judging the activity. He's basically saying "Robbing banks is bad", as opposed to off-the-cuff Batman, who's saying "Bank robbers are bad". They're very different values.

These complexities add up pretty quickly. If out little chatbox later decides to tell a story about bank robbery, which way he was taught will matter. If he learned from canon-Batman, he can tell a story about everyday people who get caught up in the need to rob a bank, and suffer the consequences. If he learned from cuff-Batman, he'll probably tell a story about ne'er-do-wells who rob banks and are generally bad people. (Of course, by the time he can tell a story about bank robbery, he'll have to have heard a few stories about bank robbery. At this point, he doesn't even know what "robbery" means...)

"That level of complexity is impossible!"

Hmmm...

What you really need is a carefully chosen fuzzy semantics system, and then you build it step by step by step, naturally.

In order to comprehend the story, you need to understand a huge amount of basic experiences that humans learn in their infancy. Things like: time moves forward. Stuff doesn't just vanish. Some things smell bad. Big people are usually stronger. People like being happy.

These same bits of understanding should be able to form the basics of the semantics system, as well.

How would you store "bank robbers are bad" as opposed to "robbing banks is bad"? Well, the first is a person who, in their past, robbed a bank. The second is the act of robbing a bank. They're fundamentally very different, and the first one contains the second. It's not usually so simple.

Here's a really subtle example to chew on:

"The baby grew up clever and strong" as opposed to
"The baby grew up to be clever and strong"

What a tiny difference. Surely there's no difference?

Actually, there's a huge difference.

In the second case, the act of "growing up" has a purpose: to be clever and strong. In the first case, the act of "growing up" just happens, and the baby is clever and strong while growing up.

The difference is the distance between "the Goonies" and "Stand by Me". The Goonies is about a bunch of kids who run around being kids. They all have their personalities and shticks, and there is some growing up, but by and large it's about kids being kids. The movie ends with them still being kids, and the whole point was actually to maintain the status quo.

Stand by Me is about kids going out and growing up. They have personalities and an adventure, but the whole point is to grow up. Their childhood is a transitional phase. It's the point of the whole thing.

That's the difference adding two little words makes, when you are thinking in terms of stories!

How would you represent this in your semantic web?

Whoa-oh, now you're getting into a messy situation!

Classically, we'd build our semantic web with some kind of connective system. "Grew up clever and strong" would link "grew up" to "clever" and "strong" directly. "Grew up to be clever and strong" would have the same links, but with some kind of qualifier. "Purpose" links, perhaps.

The problem with this method is that it's more or less one-way. We're not looking for a book report: we don't care that the kid grew up clever and strong. We just care that kids grow up, and whether they grow up with attributes or for attributes.

So I've started to compile a list of... I don't have a word, really. We'll call them cogshazams. Cogshazams are mental pigeonholes (or, perhaps, pidginholes, ar-har) that concepts can fill. They are the basic mental responses someone can have.

One example is "security". A concept might be about security - giving more security or taking security away. Getting married is usually slotted strongly into the security cogshazam. Having a kid is generally full of insecurity - it's a big responsibility that changes your life - but being a kid is generally pretty secure.

Another example might be "competition". The cold war was anti security, pro competition.

The idea is that our little listener will build a semantic net based mostly on these kinds of judgments rather than building a really complicated semantic net.

In the case of growing up: if growing up is for the purpose of ending up an adult (IE, growing up to be clever and strong) then we can label "growing up" as "destiny" cogshazam. When we think about it in the future, we'll keep in mind that people who are growing up are marching towards their destiny.

If growing up is just something you do, and you can be all clever and strong while you do it, then we would use "building" cogshazam. We would keep in mind that someone who is growing up is improving, growing... but are who they are, not necessarily marching towards being some specific "final form".

"Growing up" can have a lot of other labels attached to it, depending on the stories you tell. Friendship is a common one, as is security, but anti-control...

Anyway, once we've labeled "growing up" as building or destiny, we might later wish to reconstitute that knowledge. How would you do that?

Well, if you wanted to talk about someone growing up, you would call it up. You would see it has, say, destiny attached to it. You would find something else you like that fills the proper linguistic slot and has destiny attached to it. You would combine them. To simplify grossly.

So you might say, "When he grew up, he conquered the world!"

There is still quite a lot of handwaving... for example, we'd need to keep track of amounts. Something that is only a little bit destined should probably not be so strongly combined with something that is strongly destined. For example, "He was born on a dark and stormy night, so he grew up to conquer the world!" is a little awkward...

Also, this doesn't cover things like twists, and I'm flat-out leaving out the parsing part...

But I think this is plenty long as it is.

If you get this far, you have a lot of time on your hands. Might as well leave an insightful comment.

Sunday, January 20, 2008

Save the Chatbots, Part 2!

_{Electric Boogaloo!}

My last post on chatbot-driven games got me thinking, and I see a limitation I don't like.

A big part of the feedback in a game is feedback loops. You do something, something happens, you do something based on that, something happens based on that.

The issue is that most games use a recursive loop, but these games wouldn't. Let's see if I can say what I mean.

In an RPG, you walk around a world, and what you are near are depends on where you have walked before. Every step subtly changes your location, bringing you closer to some things and further from others. Similarly, when you kill an enemy, it gives you points and gold and so forth. Although the victory itself is win/lose, the side effects in terms of expended and gained resources are very muddy, and can be anywhere on the scale of goodness. Winning a fight but using up all your magic is almost a loss, even though you won. Moreover, all of these feedback loops - fighting, walking - are more or less unlimited. Save for unusual restrictions, you're allowed to walk around and fight as much as you like, "cycling" the loop at your pleasure.

On the other hand, in this chatbot game, everything is binary. You either uncover the next bit of information or you don't. You either convince the chatbot of something or you don't. It's impossible to cycle this in an unlimited fashion without creating some kind of unusual pseudo-AI, and there are really no variable side effects because there is no engine to track them.

Even if you aren't specifically stuck to a single linear story, you're still going through this only marginally interactive set of scripted rails.

I'm not saying this is at it's heart a bad thing, but it is a very limiting thing.

With this, you cannot realistically allow the players to just dick around. Either it has no effect at all, or they're moving forward. You have to script every possibility, which means that the players are more exploring your story and less exploring your world. They might as well be reading a book that only lets you turn the page if you answer a riddle.

I'm not saying this is an innate restriction of chatbots. I'm saying that it's an innate restriction of games without recursive algorithms. Because the content is not implemented in a fashion that can be unlocked in tiny portions in many different ways over many different times, the content is grotesquely inefficient.

Creating a dungeon is a lot more work than writing up a description of a dungeon. But the implemented dungeon can be explored by players in many different fashions at many different speeds, and there can be many different progressions of fights and treasure. Moreover, depending on how the player explores the dungeon, exploring the dungeon gets easier or harder.

So, a player will read a description of a dungeon and think, "okay, cool, a dungeon". Two minutes later, you had better have another description of something and it had better make sense. Even then, the player has less of a feeling of agency. It's an inferior solution - I would guess your time is spent at maybe 1% efficiency when creating non-recursive content.

This is actually the fundamental problem with adventure games in general, and is probably why they are not as popular. While there is something very juicy about the fact that every obstacle has a unique solution, the fact is that there are only maybe 1/50th the number of obstacles that you'd find in a recursive game of the same length. For every unique obstacle in an adventure game, you've fought four battles and gotten an upgrade in an RPG. Each battle is not simply an obstacle, but a complex set of interlocked obstacles. Same with upgrades.

This is probably why I preferred Quest for Glory to King's Quest: Quest for Glory contained a number of recursive, interlocked systems in addition to the juicy unique puzzles.

Now, it might be possible to create a chat-bot game that has recursive systems, but the fundamental issue here is that chatbots are essentially just memory banks with confusing UI. No chatbot on the market has the ability to create meaningful content or adapt to changes in the world on any interesting level. You would have to create a backbone that somehow determined what changes needed to happen and then modified the chatbot's memory banks. This would be difficult even without the complex world engine, because generating English that is fun to read is right up there on the list of unsolved problems.

This is the big reason that games with adaptive/generative worlds don't have talking NPCs in their generated parts. Any talking NPCs they have are back in the part of the game that can't be significantly altered by the player's recursive play.

This is why when you talk to, say, characters in Animal Crossing, they always seem so self-obsessed and oblivious. It's because they actually cannot notice when you change the world, except as they are scripted to. They cannot look at what you have done and say, "wait, in order to get to your door I need to wade through a river, what's up with that?" They are not only incapable of that level of logic, they are incapable of generating that kind of text.

This is why graphics are so popular: we have, over the decades, figured out a lot of nifty ways to recombine and adjust graphics to a recursive situation. With some newer games, you can even create completely new graphics inside the game itself - a completely unique face, most commonly.

In some respects, I think it's because graphics is easier. Graphics is simply N-dimensional bits that are linked and moved around algorithmically. Wide cheekbones? Alter the cheek bits a bit. Green skin? Change the color of the skin.

What is language? Written language is a maybe low-dimensional construct that is representing a maybe medium-dimensional construct (spoken language) that is representing some theoretical reality!

But I don't actually think that's any harder. Graphics are just as steeped in cultural references and represent theoretical reality, and graphics are 2D representations of 3D representations. (You can count color as another dimension, I guess.)

Unfortunately, that doesn't mean it's easy. After all, computer graphics aren't easy.

But maybe the same approaches could be taken...

I'll have to think about that, I've gotten off track. What I'm saying is that it's very hard to use chatbots in a recursive game, and that's a restriction I can't bear.