There's been a steady trickle of people discovering Pandora, an automated internet radio station which plays songs like the ones you like. I'd say that, on average, one person a month comments (either on their blog or directly to me) about Pandora.
I discovered it a long time ago, but allow me to recap. I'd like to think I've improved in the past nine months.
Pandora is a good idea, but the database is crap.
That's probably being a bit harsh. For many people, I'm sure it serves quite well. But for us picky bastards, we leave unsatisfied, because what Pandora thinks we'll like is picked randomly out of the same... not "genre", but "theme".
(I seriously doubt their method has changed in the past nine months, but let me know if this is wrong, and I'll re-investigate.)
(Edit: Apparently, Pandora's database isn't third party: it just looked that way nine months ago. That doesn't chance much of my commentary, although the way their database is apparently set up, instead of having bad data, they just have spotty data. I'm not sure which is worse...)
Pandora uses a very large third-party database to figure out things you might like. You type in, say, the name of a band. The third-party database specifies things about that band. Like, say, "minor harmonies" or "alt rock". Then Pandora pulls songs out of their library which match some of those specifics. As you rate songs, it weights the elements that song contains.
The problem here is twofold. First, they are measuring the wrong things. Second, they are measuring unreliably.
The third-party database isn't clean. Do you think all songs with "minor harmonies" were labeled as such? That's pretty high-level knowledge - chances are, whatever goon is entering the newest Rammstein song data doesn't know jack shit about minor harmonies.
This means that the measurement is unreliable. However, even if it were reliable, it would still be measuring the wrong things.
For example, I love rock. Good rock. Ooooooh yeah. And techno. When it's good.
But most rock is terrible, and even more techno is terrible. When I rate these songs negatively, it either has (A) no effect or (B) makes the algorithm think I don't like techno. (I don't know which is true: I can't see their code.) Perhaps it has a magic handwave at the songs lots of people think are bad, but I doubt it. That would require them to keep data on every song rather than just every user.
Calling a song "rock and roll" helps not at all. Some people consider more recent country songs "rock and roll". I don't. Oh, and I like western, but not country. Can it tell those apart?
The answer is, no. Not even close. It just sits there and throws random crap at you. Because it's measuring something that's almost (but not completely) unlike user preference.
User preference is what songs the users like to listen to.
So, here's an incredible new idea: why don't you let the algorithm automatically generate "pseudo-genres" based on what songs get rated high or low by any given user?
For example, you have one user, Alex. Alex likes filk and pop. But your algorithm doesn't know that. All it knows is that Alex likes these songs, hates those songs. It's the "Alex" pseudo-genre.
Then Bob joins up. Bob likes filk and polka. Originally, the algorithm thinks Bob shares Alex's genre, but Bob starts crushing pop music with the might of his two-thumbs-down. The algorithm quickly realizes there are three pseudo-genres at work here: stuff Alex likes, stuff Bob likes, and stuff they both like.
When Caroline joins, lets say she only listens to Pop. The algorithm quickly realizes that she likes the "Alex-only" genre, but dislikes both the "Bob-only" genre and the "Bob & Alex" genre.
This sounds complex, but I've worked on the math, and I think it's totally plausible. It does require a rather significant database, though.
Oh, and it doesn't let you easily play a song you've been paid to play more often, more often.
But what it does is create pseudo-genres which are actually related to what people prefer to listen to, instead of meaningless user-assigned "genres" like "filk".