David Blei points me to this report by Lars Backstrom, Jonathan Chang, Cameron Marlow, and Itamar Rosenn on an estimate of the proportion of Facebook users who are white, black, hispanic, and asian (or, should I say, White, Black, Hispanic, and Asian).
Facebook users don't specify race/ethnicity, but they do give their last name, and Backstrom et al. use Census data on the ethnic breakdowns of last names to estimate the proportion of Facebook users in each of several Census-defined ethnic categories. They present their results for several snapshots of Facebook from 2006 through 2009.
Their analysis seems reasonable enough to me, even if it won't be exactly right since the Facebook population is not a random sample of Americans within each ethnic category. The next step is to break things down by other variables, most obviously age, sex, education, and state of residence. Does the Census give last name data for any of these subcategories of the population?
My main comment in that there's lots more you can do, once you have these numbers; for example, you can estimate how often people in different groups (categorized by age, sex, ethnicity, etc.) log into Facebook, how many Facebook friends they have, and so forth. You can get all sorts of details, far beyond anything my collaborators and I have learned about social connections.
Also, a few minor comments:
1. Backstrom et al. appear to use the term "white" and "Caucasian" interchangeably, which, as I've noted before, isn't quite right, as most South Asians are "Caucasian" but not white. It's not clear whether south Asians fall in the "Caucasian" or "Asian/Pacific Islander" category in this analysis.
2. The dotted lines in their very first graph are labled as "the proportion of the Internet population" for each ethnic group. I'm just wondering: where did they get these numbers?
3. Also, along the same lines, could they give the link to the public data they used? I followed the link they did give, but it was a general Census website, and I wasn't sure where one would go to find the full tables.
- Log in to post comments
It seems to me that this Census based race break down particularly doesn't work to answer this type of question. If you are trying to ask whether people of a certain race are proportionally represented on facebook, you can't use the proportional baselines to analyze the data. For example, let's say 60% of Jones are white while 40% are black. Just plugging in those percents would high any differences in the racial makeups. One solution would be to exclude all names that aren't at least 90% from one race. Another option would be to include first names in the analysis (though that would potentially over-sample minority groups known for unique naming and leave majority names as unclassifiable)