Friday, June 02, 2006

Journey into the Amazon

(This is a repost, if you see it twice, sorry. This one has comments enabled. The other one, for reasons beyond me, did not.)

I made a post about Pandora's song selection methods, and it was pointed out that Amazon does something similar to what I suggested.

Let me explain what Amazon does, first, then explain the differences.

Imagine that for each book it offers, Amazon puts a circle on a page. Whenever anyone buys a book, Amazon draws a line from all the books they bought before to that book.

So, if five people buy "Zozo's Tea Adventures" and "Pandas on Iceskates Photo Album", then there are five lines between the two.

Now, if someone buys (or adds to their wishlist) one of these books, Amazon sees what other books it is connected to by the most lines and recommends them.

In fact, you can see this in action: if you go to your Amazon recommendations page, each recommendation has a definite source. So, when you buy "Zozo's Tea Adventures", Amazon would recommend "Pandas on Iceskates" and specifically tell you it was because you bought "Zozo's Tea Adventures".

Of course, if "Pandas on Iceskates" had five lines and "Teabags from Hell" was connected by fifty, Amazon would recommend "Teabags from Hell" rather than "Pandas on Iceskates".

This works okay, but its primary strong point is simplicity.

It has a few major flaws.

One of the flaws I continually run into is that I buy books to research a specific topic, and then Amazon insists that I'm interested in that topic forever. For example, I did some research into the Steampunk genre. Now half my recommendations are for Steampunk books, even though I haven't bought any (or shown any interest in them) for more than six months.

This is relatively easy to fix: every time you visit your recommendations page and don't buy anything, it counts as a strike against that particular network. Since I've visited my recommendations page half a dozen times and never even clicked on a link to Steampunk books, it should realize that those carrots aren't catching me, and move on to offering different carrots.

A dramatically worse flaw is the "mob rule" mentality that these links provide.

For example, there are evidently ten million people who simply buy everything on the New York Times' best seller list. So, when I buy a book which happens to be on it, I get suggestions that I should buy other NYT BS. After all, everyone else buys them.

Of course, my only interest in these other books is to mock them. "YOU: The Owner's Manual" is just about the polar opposite of something I would buy.

Sure, when I look at the recommendations, I see, "yeah, those are popular because they are connected to these." That's fine, but it's not: "Oh, that book looks interesting." It doesn't sell me any books. No money goes to Amazon for these recommendations.

Okay, sure, it pleases the majority to some extent. But to what extent? How often does someone buy something Amazon recommended? I doubt more than 1% of their sales come from recommendations.

Because even the ultimate generic customers have niche interests and different purchase patterns.

So, what I recommended for a Pandora-like system, I'll now recommend for an Amazon-like system.

In Amazon's case, it's really simple.

Make recommendations additive.

I bought Freakonomics and Blink. Amazon passes recommendations on that are echoed from the two of those. However, the connections from Freakonomics run off into bestseller land, and the connections from Blink seem to run off into crazed lunatic land.

Presumably, the two networks touch. It would be hard to believe that I'm the only person who's bought both of these books. In fact, chances are very high that the networks are intermingled in several locations.

Books which are connected to more than one of my purchases should be suggested more often - depending on the strengths and disparities of the connections.

For example, if Blink and Freakonomics are very tightly connected, then it's not very useful to try and triangulate what books I'm likely to like from those two. But if I buy, say, Freakonomics and "Cloud Captains of Mars", those are disparate enough that triangulated books are probably quite interesting to me. (Well, no, actually, because "Cloud Captains" was for research purposes, but add in what I talked about earlier...)

This might be kind of hard to see, so let me explain:

I might be the only person who bought "Cloud Captains of Mars" and Freakonomics. However, someone else bought books close to those books. Say, someone bought "The Undercover Economist" and "GURPS Steam-Tech". Those books are tightly connected to Freakonomics and "Cloud Captains" by other purchasers who have a general interest in those areas.

Therefore, there is a networked superstructure; a chain running from "Cloud Captains" to Freakonomics via user preference.

Now, you could even go better than this by "color coding" your links based on their sprawl areas. For example, if a huge number of people buy five particular books, those five books are linked very strongly. But chances are that many other people won't want to by the other four when they buy one. So you "color code" the most common links and then do a few suggestions to see whether the buyer in question is part of that "color".

For example, the NYT BS list. Chances are, they are all heavily linked. But those links should all be considered "one subnet". I have no interest in that subnet, so this super-Amazon should ignore those links when it comes to determining what books I'll like.

Clear?

Edit: Amusingly, Amazon sent me a new recommendation as I was typing this. It's for "Artificial Intelligence for Games" - a topic in which I'm very much interested. I don't much care for what I see of the book, but at least the topic was right.

The reason for this success is that this is such a niche audience that there is only one "color" of line connecting these books. So, functionally, it's as if, for this tiny little section of the network, they were running with the method I explained above.

No comments: