Sitemeter and Privacy

Dan Solove brings up some privacy issues with using sitemeter on blogs:

But Site Meter also lists the IP address of each visitor, something that the public really doesn't need to see. An IP address is a unique numerical identifier that is assigned to every computer connected to the Web. It doesn't reveal your name, but it can be used to trace back to the specific computer you used or be linked to your account with an ISP. In other words, your IP address can be used to find out who you are.
...
So all this made me realize that we do have some data about you and we need to construct a privacy policy. Regarding Site Meter, bloggers who use the premium Site Meter service (which we use) display full IP addresses in their public stats. Those who use the free version of Site Meter have the IP addresses partially blocked out in their public stats. Site Meter has an option to conceal all the stats, but it doesn't allow for only concealing or partially blocking IP addresses. The choices are to publicly display everything or conceal nearly everything.

Dan then asks some questions:

I. YOUR ATTITUDES TOWARD PUBLIC STATS

1. Do you find our public visitor stats via Site Meter to be useful? If so, why?
2. Do you find it problematic for your IP addresses to be publicly displayed in Site Meter and other visitor tracking services?

II. YOUR KNOWLEDGE ABOUT WHAT INFORMATION WE HAVE ABOUT YOU

3. Did you realize that when you visit our blog and others, that your IP address and other information are publicly available in our Site Meter logs?
4. Did you realize that when you make an anonymous comment on our blog, it is possible to link up your IP address with your comment via Site Meter stats?
5. Did you realize that when you make an anonymous comment on our blog, our blogging software records your IP address, which could be subpoenaed?

III. YOUR THOUGHTS ABOUT POLICY

6. Should we continue on as usual (public Site Meter stats with full IP addresses)? Or should we block full IP addresses from public view?
7. If there's a tradeoff between having public stats with full IP addresses and no public stats at all, which of these options would you prefer?
8. What should our policy be if we are requested by others or subpoenaed to provide identifying information (an IP address or email address) for an anonymous or pseudonymous commenter?

I would like to hear some answers to these as well, because at least for denialism blog, I would like people to feel comfortable leaving anonymous comments. I, unlike cranks who go insane over anonymity, am quite happy to have people protect their identity. I would rather have input (non-cranky input at least) from as many people as possible, and I understand that many people are not in positions where they would like their opinions tracked back to them for a variety of good reasons.

I will also make the following points in terms of our informal privacy policy.
1. I will never release email addresses or IP numbers for any reason.
2. Even if I remove sitemeter from the blog, I will still see IPs etc. based on MT's comment system, and I will only use it when sockpuppetry or some other malfeasance is at issue.
3. I will never attempt to break someone's anonymity using IP or private email address, except of course, in the case of sockpuppetry or other misbehavior in which someone is abusing comments in multiple names, assuming false identities, or generally causing confusion by misidentifying themselves.
4. If you do leave a link or enough information in the post or the name field for me to figure out who you are in a google search, you are fair game for identification.

I would encourage other sciencebloggers to discuss their privacy rules. I can't recall a single instance of abuse by my sciblings (although I have seen some on other blogs like Huffpo) and for the most part this stuff is common sense. But people may feel more comfortable, and the conversations more open if they know we won't actively try to "out" anyone just for pissing us off. In the meantime I've upped the privacy on sitemeter as well so people can't view the IP reports (or anything else really, I wish it were more customizable).

More like this

This comment by Lassi Hippeläinen deserves notice: Sorry if I sound pedantic - I worked many years as a system architect in computer secutiry - but this argument will not go anywhere, unless its basic terminology is clear. More specifically, there are two concepts that are getting mixed up all the…
I took on the ScienceOnline09 anonymity panel because I thought it might be interesting, but the conversation that has developed has turned this into a much deeper issue than I had anticipated. I'm stepping into a big, brown pile of ethics here, and hopefully Janet won't make too much fun of me.…
Sciblings are discussing the ethics of anonymity all over Scienceblogs. I want to pose a different question: practically speaking, is anonymity even possible? Consider: 1) There is no standard definition for what is anonymous or anonymized. For instance, AOL released a putatively anonymous…
Somebody recently asked me whether I had figured out who Female Science Professor is. I truthfully replied that I haven't even tried. That was the first thing that came to mind when some jerk from the National Review revealed the identity of "Publius", kicking off another round of discussion about…

Bravo; those are pretty good rules.

Just for the record, a few months ago prosecutors in Arizona requested the following information from a progressive newspaper that was following up on issues of integrity by public figures:

Energized, perhaps, by this mugging of Constitutional safeguards, Arpaio, Thomas, and Wilenchik then shot the moon. The grand jury subpoena also demands Web site profiles of anyone and everyone who visited New Times online over the past two and a half years, not merely readers who viewed articles on the sheriff.

The subpoena demands: "Any and all documents containing a compilation of aggregate information about the Phoenix New Times Web site created or prepared from January 1, 2004 to the present, including but not limited to :

A) which pages visitors access or visit on the Phoenix New Times website;

B) the total number of visitors to the Phoenix New Times website;

C) information obtained from 'cookies,' including, but not limited to, authentication, tracking, and maintaining specific information about users (site preferences, contents of electronic shopping carts, etc.);

D) the Internet Protocol address of anyone that accesses the Phoenix New Times website from January 1, 2004 to the present;

E) the domain name of anyone that has accessed the Phoenix New Times website from January 1, 2004 to the present;

F) the website a user visited prior to coming to the Phoenix New Times website;

G) the date and time of a visit by a user to the Phoenix New Times website;

H) the type of browser used by each visitor (Internet Explorer, Mozilla, Netscape Navigator, Firefox, etc.) to the Phoenix New Times website; and

I) the type of operating system used by each visitor to the Phoenix New Times website."

Special prosecutor Wilenchik wants this information on each and every New Times reader online since 2004.

The information should also be scrubbed on a routine basis; when it no longer serves a direct requirement. Partition the database on a time boundary, and do a wholesale purge of the ip fields at that point.

Here's my crankery -- we are close to the system of systems and additional accumulation of datapoints to model us just isn't a great idea.

Mark,

Thanks for a very thoughtful discussion of my post. One quick point. You write in your "informal privacy policy": "I will never release email addresses or IP numbers for any reason." I don't think you can really promise this. You might be forced, via a court order or subpoena, to release this information. When you say you won't release data for "any reason," do you really mean any reason? The "reason" might be that you don't want to spend time in jail on a contempt of court charge. The problem with the law is that it allows information to be obtained way to readily with subpoenas (especially by the government). So despite the best of intentions, those holding personal data are always at risk for having to disclose it in response to a subpoena regardless of what promises are made. The law of subpoenas doesn't respect contracts either -- so even if it's part of a contract not to disclose information to others, a subpoena can allow others to get the information. Welcome to the happy world of privacy law! Many companies make broad promises of not sharing their information with third parties, and they rarely mention subpoenas. But these promises can all be trumped by a subpoena request or a court order or, in some circumstances, a National Security Letter) . . . or perhaps a threat by the Bush Administration to secretly kidnap and torture you unless you turn over the information!

So I think the best approach is to tell your readers and commenters that there are risks to their privacy, despite your best intentions . . . unless you're willing to spend a fortune defending against a subpoena or court order or spend some time in jail for not complying.

Dan

I'm really torn on this issue. On the one hand, as a blogger and the moderator of a message board, I like having the ISP addresses since trolls can be quickly and permanently tossed if necessary. As a woman on the net, stalking is a real concern; it happens too often.

On the other hand, I wouldn't want to be put in the position Dan mentions above--having a subpoena to deal with and then having to make the decision to either throw people under a bus or go to jail.

Dan, at that point it's Seed's problem more than mine. It would be unlikely that I would be directly contacted for IP information, and it's not like I have a database of it other than email. I think people realize that if I have been subpoenaed there is little I can do, but the point I want to make is I won't use any email or IP information maliciously just because some crank pisses me off.

Mark and others, if you go way down this page and click on Sb's "Privacy Policy" you'll get a huge amount of text that includes:

Tracking technologies may record information such as Internet domain and host names; Internet protocol (IP) addresses; browser software and operating system types; clickstream patterns; and dates and times that our Site is accessed. An IP address is a number that is automatically assigned to your computer whenever you are surfing the web. Web servers, the big computers that 'serve up' webpages, automatically identify your computer by its IP address. It is not our practice to link the information we record using tracking technologies to any Personal Information you submit while on the Site. However, we reserve the right to use IP addresses and other tracking technologies to identify a visitor only when we feel it is necessary to enforce compliance with the Site's policies, to protect the Service, our customers or others, or when we believe in good faith that the law requires it.

So, I agree with Mark that any law enforcement action would go through Seed rather than through us individual bloggers (we are officially 'independent contractors' and not employees of Seed). However, I agree with Mark that each blogger should make an explicit statement that our personal tracking software also records IP addresses.

In my 'About' page, I refer readers to this lengthy policy but this discussion from you and Dan make me think I should be more explicit. In fact, I recall worrying about this issue when I recently posted on my 100,000th visitor and their institution, cautioning readers not to bne alarmed in that I know very little more than that from SiteMeter.

Finally, I do like Mark's rules quite a bit. The only thing I would add is to inform readers that SiteMeter allows me to learn what pages were read from each IP address, what website/page referred you to me, and, if using a search engine, what terms led you to my blog.

Great discussion, all.

In order to personally identify any person from an IP it requires a subpoena. Depending on the type of service your ISP provides, your IP changes each time the modem is cycled. From a personal privacy PoV the IP tells next to nothing (geographical area and provider). Only *your* ISP can link your IP to any personal information. Mark can't. Seed can't. Rackspace can't.

The IP is not the issue, the issue is whether your ISP will roll over and provide your information without a proper subpoena.

On the other side, in order to issue a subpoena to your ISP they need your IP in order to find them and issue that subpoena, so keeping the IP private adds a layer of legal paperwork that may help to discourage wankery.

Given the issues involved, I don't feel that it's a major issue... the major issue is whether judges will issue subpoenas without cause, and whether ISPs will release the information too easily.

If you are worried about your usage being tracked, it isn't done by IP, it's done by things like tracking cookies and back channel parasites. Clean your system.

Actually Graculus, you'd be surprised what I can figure out from an IP, especially if accompanied by an email, and a little bit of text. You have to take into consideration secondary identifying information makes it pretty easy for me to figure out who people are.

One has to consider that the combination of geographic and a tiny bit of personal information from posts allows one to at the very least speculate about identities, and often point to specific people. This has happened on the internet before. Someone will say, "I'm the CFO of a company and ...(dishes dirt)." An IP and about 10 seconds of googling and you'll know exactly who you're talking too.

It's not a concern for every person. It's also not a concern for people who are very cautious. However, not everyone is savvy as the most technically-proficient of my commenters, and "common sense" doesn't apply, especially to those who don't get how technology works, which could be roughly defined as the entire adult population over 50. It may be obvious to you what steps to take, but not everybody.

In the meantime, I will just make it explicit that I won't use personal information against people that I can glean from the comments unless there is malfeasance on the part of some troll. That is all.

Actually Graculus, you'd be surprised what I can figure out from an IP, especially if accompanied by an email, and a little bit of text. You have to take into consideration secondary identifying information makes it pretty easy for me to figure out who people are.

Well, email and secondary information are the killers, not the IP in itself. That was my point, sorry to be unclear.

those who don't get how technology works, which could be roughly defined as the entire adult population over 50.

Damn kids, get off my lawn!

will just make it explicit that I won't use personal information against people that I can glean from the comments unless there is malfeasance on the part of some troll. That is all.

As is right and proper.

I guess I was just trying to address unfounded concerns about exactly what IPs in themselves reveal. I've seen some mighty hysterics in this area.