If you've been reading for a while, you may remember my analysis of Pandora and other internet radio things. The idea here is they allow you to like or dislike certain songs, and they'll feed you more of what you like. There are many applications for this kind of system: Amazon, MySpace, DeviantART - anything where there are a lot of people and a lot of content.
Pandora - and many others - handle this by clumping things into genres. Then they try to decide what genres you like or dislike, and whether a given song is good or bad within a genre.
That's a rotten way to do it. A) it's got a lot of overhead, B) it assumes all things fall into specific, pre-existing genres.
I spent a lot of time struggling to figure out a way to do it better. To do it using the power of other people's favorites. Today, here is an idea for a solution.
Let's say you're playing a MMOG like SecondLife. There's a lot of player content. You have two friends. One is a hardcore combat hog, player-killer extreme. The other is an RP sexaholic. You don't have much interest in either of those things.
They both use a lot of player generated content, but as you might suspect, there's not a ton of overlap. However, you notice that they both have really cool houses full of really cool knick-knacks. A lot of them by the same creators!
In SecondLife, you'd probably make note of the creators, then go and track down their shops or some such. In our game, the game already knows what the players like/have bought. It says, "Oh, you like that lamp? Both of your friends like that lamp. It's a genuine DeathKnight Bloodwine Lamp - here's his catalog. Other things both your friends like are..."
That example is taking something trivial and making it even more trivial, and it is also probably a bit... exposing. Let's take it to another level entirely.
You're playing in a game where players can create content. Not just hats and houses and dildos, but stories and adventures and NPCs and histories.
If you were to wander this universe, you would find a lot of really terrible content. It's the way the world works.
Lets say that, due to some staggeringly bad luck, the first thing you encounter is Harry Potter slash fic. You hurredly vote it down, and it vanishes.
The computer doesn't know that it's slash fic. It's not in a category as fanfiction or even porn (although it may be marked "adult").
Instead, the computer queries for everyone who liked it. Everyone who voted it up. And then, it looks at their shared favorites and bans the top most common from your sight. If 30 people liked what you're seeing now, and 16 of them also liked something over the next hill... you probably don't want to see it.
On the other hand, if you voted the slash fic up, it would have done the opposite, and that piece of content would now be flagged for your attention.
That's the basic idea.
So far, it's basically an inverted version of Amazon's method. But let's go a bit further.
What if you like fantasy adventures, but there's one you can't stand? You want the game to keep giving you fantasy adventures, just fewer shitty ones. Or fewer ones involving talking mascot characters. You hate mascot characters.
If you just follow the above algorithm, it will actually prefer to ban the GOOD ones, because they have more shared favorites.
If the system tries to ban something you've already favorited, it nullifies the whole thing and moves on to a quality-level analysis, which is something I don't think Amazon does:
Instead of pulling "people who like this thing I hate", you have to pull "people who hate this thing I hate and LIKE my favorites that the first level of analysis tried to ban."
From here you can't just ban shared dislikes, because they'll probably end up being that slash fic, which is no help. Instead, you have to limit your ban-stick specifically to things that were on the list for banning on the first level. Basically, you're still using that original list from the first analysis, you're just keeping the good stuff off the cutting board.
That's some tasty tasty algorithm, there!
Do you understand it? Obviously, it can be further optimized, but do you get the idea?
This is theoretically better than the current system used by Amazon, because although they use favorites, they sometimes get my awesome sub-genre preferences mixed up with some major genre preferences, and send me ads for crap I wouldn't buy for free. Banning it doesn't do much good...
With this method, it would quickly isolate the people who like my sub-genre but hate the major genre, just like me. It would be able to identify these emergent little sub-genres. For free. Instantly. Automatically.
We can also include an automated weighting schema. If your likes and dislikes tend to be very similar to another person's, and there are no clashes, then he'll be weighted as a more valuable measure of your favorites. If you two are precisely reversed, you'll automatically avoid his likes preferentially.