Wednesday, July 27, 2011

Degradation of Privacy

I guess this stance is an unpopular one - or, rather, an unfashionable one - but I really hate when companies gather data about me and my friends.

I'm bringing this up here because I catch a lot of flak about it from even fairly geeky friends. So let me explain why I am against corporations collecting and retaining information. More specifically, why I think it should be illegal for corporations to use facial recognition software or otherwise "deanonymize" information.

Let's start small.

Point one: corporations cannot be trusted to keep data safely. Corporate IT practices are notoriously poor, and there have been hundreds of examples of accounts (even ones with financial information such as credit card numbers) being stolen by the hundreds of thousands. Sony is the most recent loud example, but Citibank and others show it is hardly a black swan.

So, even before we get to things like privacy, corporations can't even be trusted to keep the data they actually need to operate day to day safe. There are plenty of practices that can keep user data safe even if the corporate database is hacked or leaked. These all involve not keeping data. Only keeping hashes, discarding the credit card number except perhaps the last four digits for identification, and so on.

Some data are less critical. It's a pain when someone steals 500,000 client email addresses, but it's probably not going to result in your clients actually being harmed significantly - just an uptick in the amount of useless spam being caught by the filters. Except that's not actually true: that data can help deanonymize other data, which is a problem most people don't bother considering.

Point two: Data give corporations advantages. Corporations are businesses, 99.99999% of which are out to get as much money as possible. Even without any information about you, corporations build their products to lock you in and drill your pockets as much as possible.

While I don't like this much, I understand that it's not feasible to magically make corporations stop doing that. So let's proceed with the idea that any advantage the corporation can get will go to mining its consumers as much as possible.

Normally, this is moderated by competition. If one company is too abusive, you can switch to a competitor that offers a very similar product.

However, data are nontransferable value. It's best to think of data as the on-line equivalent of "location, location, location!" The reason eBay is popular is because eBay is popular: the mass of data it has - the number of transactions, the ratings history, and so on - makes it more valuable to post your stuff to eBay than a smaller competitor. The only competitors likely to succeed are those specializing in very limited fields where the noise on eBay is actually a downside.

This is true of social data as well. A big difficulty for most people moving from Facebook to Google+ was the need to recreate their social network. Google+ reduced this difficulty by offering up suggestions based on your email history. Google was able to (somewhat) overcome the mass of Facebook's data by leveraging its own, similar data. However, a social site such as Appleseed does not have data to leverage, and is therefore at a tremendous disadvantage.

This is not simply value-add data, either. Having information about your users allows you to advertise to them as well as actually make their experience better. Amazon is a great example of this, where it will advertise hundreds of targeted ads at you every page - "also bought X" "if you like Y, try Z" "EVERYONE IS BUYING A KINDLE OH GOD WHY WON'T YOU BUY A KINDLE YOU BASTARD LOOK HERE IS EVERY KINDLE EVER JUST PICK ONE BUY IT PLEEEEEEEEAAAASE!"

And so on.

This is a business advantage. Better advertising of related services and products is an advantage. It can generate revenue via ad fees or via higher conversion rates on direct sales.

Summary: think of data as location. The more data someone has, the closer they are to your house. That means that you're more likely to shop there, and if they open another store in their mall, you'll be more likely to shop at that new store. Data are a direct business advantage, and you're not likely to drive an hour to go to some interesting new place no matter how fancy-pants it is in comparison.

Point three: data can be combined, and it is easier to do so if you have more data. Some of you may have noticed I've been sticking to "data as plural", normally a pedantic and irritating choice. This is because I think one detail most people miss about data is that it is plural. Data is not like a bouncy ball. Data are like water in a cup.

Companies can combine data. This is one reason why companies frequently sell data to each other. To date, 99% of the data on the web about you has been more or less the same: your name and email address are the details that can get sold, your home address and credit card information are the details that cannot be sold.

But these days, there are a lot more details out there than you might think. What movies you like to rent, where you shop, who you phoned, what topics of conversation are common in your emails and social network chats, what kind of porn you surf for, which aliases are yours and not somebody else's, what your political preferences are, what kind of stupid shit you said ten years ago.

Most of this data is pretty useless to most people. Most corporations don't even care about it. But it is out there. It is really easy to trawl your Facebook or Twitter or Google+ account to collect a list of everyone you talk to and who talks to you.

Right now, your account is tied to an email address. Your Google+ address, probably. So Amazon can automatically, with no humans involved, look to see who is in your circles and visa-versa. And, next time you go to, it'll say "Hey, Greg bought this book, you should buy it too!" Of course, Greg bought "Animal Sex and You: A Practical Guide", so it might be a popup you wish you'd never seen...

Think it's outlandish? Here's a fun experiment! Go to, and search for wishlist. Just randomly punch in people's gmail accounts. I got a hit rate of around 20%.

Point four: no, really, data can be combined. I'm not joking. I don't really think I stressed this enough. Data can be automatically aggregated and combined. Even if you're not involved.

For example, if one of my friends posts a picture of me and then labels my face, I am now in the datastream. Especially if he labels me by email address or other unique identifier. Moreover, with fun facial recognition software, it's possible to then go and find out other pictures that are likely to be about me. Even if I'm not in the system as a user, the system has information about me.

Ever link to someone? "Oh, my friend Jerry posted this on Facebook: kalinkylinky". Congrats, your friend Jerry has now been linked to you, even if he's not even on the same service as you. Even if he has you blocked because you're a creepy stalker.

Think making your privacy settings strict will save you? Nope, it's pretty easy for me to reconstruct your social network using your friends who do NOT have strict privacy settings.

Let me make that clear: even if you set your account to strict privacy or don't participate at all, if you have any connections to other people who aren't quite so strict, your data can be easily reconstructed.

This is the same concept as "deanonymization". It's easy to take data that is supposed to be private or anonymous and link it up to a particular person using data from another source (or from the same source but another vector - IE, your friends' accounts).

Point five: data may be used to discriminate against you. Assuming that you don't care about all that above, let me remind you that data can be used against you, and already probably is.

Most employers will at least perform a simple search on your name when you interview them. Many employ a "drilling" service that will trawl for all accounts that can be linked to you and then trawl through their posts, looking for references to things like drugs.

These are services which already exist and are used reasonably frequently. It's only half a step to finding your circle of friends and, even if your account posts are private, finding what THEIR posts are about and what kind of comments you've left to their posts.

I use this as an introductory problem because it is one most people realize exists already. However, it is flat-out minor in comparison to other discriminatory activities. And here I don't mean racial discrimination, but the more general term meaning "judgment based on details or categories".

For example, we've already seen a few cases where people have gotten in trouble for Facebook pictures showing them carousing when they should be unable to work. Posts about your health are apparently court-worthy evidence when it comes to not paying out health insurance.

You have a side of you that employers, parents, government officials, and shouldn't know about? Well, your aliases are a fragile anonymity. Once broken, pseudonyms disintegrate. If you posted pornographic Twilight fanfic as a high school student under a pseudonym, the minute that pseudonym is linked to your adult identity, you are forever labeled as someone who has really, really shitty taste forever.

To those of us in a pretty swank position of privilege, this seems like a rather minor inconvenience. A) It's not, it's a major lifestyle change to give up all privacy to everyone who might want to track you for any reason. B) People who aren't rich white guys are much more subject to problems due to this kind of privacy invasion, so their issues will be 10x worse.

Anyway, the only solution I can see is to make it illegal for companies to deanonymize data or leverage publicly available data on their consumers. Otherwise, this stuff will happen if even a few of your friends are lax in their privacy settings.

Of course, that illegality doesn't spread to things like foreign companies and governments. But it should slow the spread if most of the major players that normally accrue data (such as Google, Amazon, etc) aren't allowed to accrue it in ways that endanger you. Having to compile all that data themselves is a significant stumbling block to any oppressive foreign power seeking to crack down on, say, demonstrations against them.

Most of us live in a very cushy world where we can't imagine data being used against us in any significant way. "It's just ads!" That's nearsighted and egotistical.

Friday, July 08, 2011

The Replacement Web

I've gotten a fair number of comments asking about why I think Google+ is not competing with Facebook, but with the internet. So I'll go into detail now.

Let's think about how people use the internet.


You need to look something up, find something. A reference picture of a cat. A house painter in DC. How AC/DC conversion works.

Right now, there are two ways to do this. Google search and Wikipedia. Google+ offers to radically enhance Google search: your connections and interactions in your Google+ network will validate you, so your travels carry much more authority than any astroturfer. Moreover, your connections will allow Google to guide you to exactly what you're looking for because your friends' friends' friends' already went there.

As for Wikipedia, I'm sure Google's coming up with an alternative. Probably +iPedia.

Oh, and some of you might use CraigsList for the more local stuff. Thanks!

News and Updates

I suppose a lot of users still go to, say, Wall Street Journal's or New York Post's site for news. Just browse on over at 9:30 AM and see what Murdoch wants you to know about.

However, a lot of us get our news via less restricted feeders. Things like Twitter, various filter sources such as Gawker, and thousands of specialty blogs that cover any kind of news you like.

You can argue that these don't have the gravitas of a major corporation with political aspirations, but that doesn't slow them down any. Already Google is a pretty important player here, not just passively through their search rankings, but actively through their news search, "top stories" section, and so on.

Not just news, of course, but also un-newsworthy things that matter to me, such as whether a friend's startup succeeds, or whether Germany is increasing or decreasing investment in solar power, or whether somebody's getting married. These also come to me through feeds, but mostly through different ones.

Google+ offers to take this and centralize it. Through Google+ circles and the I'm-absolutely-sure-it's-coming "extended circles", Google can easily aggregate posts and links and topics right to your Google+ page. It's probably similar to the iGoogle home in nature, but much more contextual and intelligent, so it can give you a wider variety of The News You Want without getting exposed to The News That Makes You Uncomfortable and still getting The Irrelevant Crap That Your Friends and Celebrities Like.

IE, Google+ is Twitter, Gawker, and Facebook all rolled into one, plus a few more things too.

Videos, TV, and Entertainment

A lot of people use the internet as a TV replacement. Just cruise over to and see what Murdoch wants you to watch. What, you didn't know News Corp has fingers in Hulu?

Let's go to YouTube instead. Oh, Google already owns YouTube.

Outside of actual hosting, most of your entertainment links come from filters and friends. These are the same sources that pass you news and updates, so the same things that will allow Google+ to dominate that arena will allow it to dominate this one.


A lot of people use the internet to go shopping. Buying books from Amazon, doodads from Etsy, shirts from Think Geek, and so on. Right now, a lot of us pop over to the store that we know carries what we want, and simply click "buy". The times we want a comparison on prices, we go to an aggregator site that trawls through a variety of stores, finds the pricing, and maybe rates them by reliability.

Google probably won't ever have warehouses full of shirts and books to sell you, but they do already have the "shopping search" which does all the rest. Combined with Google Checkout and, probably, a new service with a catchy name like "Google+ iNetCash Turbo", Google is perfectly capable of becoming the go-to for internet shopping, especially if they can break Amazon's grip on the books market by offering alternate sources.


Anyone who thinks Google+ is fighting Facebook is thinking way too small.

Hell, Google may even physically replace the internet, putting up fiber optics between Google-owned locations.

All of this is going to result in a much smoother, easier to use internet.

Owned by Google.

Tuesday, July 05, 2011

The New Internet Economy

Like everyone, the release of Google+ has got me in a bit of a state. I don't use Facebook, because it's a nightmarish piece of social spyware. Instead of infecting your computer, it infects your life.

The lists comparing Google+ to Facebook are endless, and they always come out in Google's favor. This makes sense, Facebook is ancient and cobbled together out of random crap, while Google+ is a polished, modern, unified piece of code.

But in those lists somewhere is one "advantage to Google+" that really bothers me. It is this: "Google makes a better steward for your personal data than Facebook."

People are talking about "the attention economy" as if it's some far-distant future thing. But that economy is already here. Anyone who creates web content for a living already knows that. Google certainly knows it, better than anyone.

Google+ is a weapon in this economy. It is used to leverage attention: while it may not increase the amount of attention poured out by Google's userbase, it can deploy it far more effectively. Facebook could be said to be part of this "attention economy", but it wasn't weaponized. Google+ is weaponized Facebook.

This bothers me. The utter lack of privacy isn't really what bothers me, it's the utter lack of concern over it. Everyone's rushing to Google+ with glee. "What's the problem?" they say, "it's basically just Facebook, and I already used that!"

Google+ is weaponized. Google+ is not Facebook. It is a new layer of internet.

Google already tracks your searches, your youtube video views, your installed Chrome apps, your email buddies, the contents of your emails... it does this to better serve you. Ads.

While it is possible to block or ignore the resulting ads, you cannot block the monitoring. If you go out of your way, you can browse in privacy mode or such, but then you can't participate in the many kinds of content that rely on you having a valid (monitored) login. For example, YouTube won't allow you to view any videos with higher than G-rated content unless you let them (and Google) monitor you.

Google+ is simply the next step, helpfully allowing the users to build a context web. The contents of the internet as well as the individual Google users will be put into a vast and highly detailed web. Perfect for pushing ads, sure, and I think most people are thinking that. They go, "Okay, I don't really care, serve me some ads."

But the problem is the context web. This is an extremely valuable web of connections and preferences that can make your internet experience much more fluid and enjoyable. Unfortunately, the web of connections is wholly owned by Google.

Exporting your data won't help much, and Google knows it. It's not just about who your friends are, any more than your personality is about what genes you have. It's about the billions of links and cross-posts and retweets and conversations thrown about and followed.

That's the problem I have with Google+: it is an effort to build a new kind of social internet. I wouldn't mind if that internet were public - I think it's a fantastic idea. But they are aiming to build a Google-owned social internet.

Nobody seems to care.