Scaling digital ethnography: the paper is out

The OpenCare conversation has grown and grown. Last time I checked, it stood at 3,082 contributions, authored by 282 different contributors. It contains 615,000 words, quite a bit more than Tolstoy’s War and Peace, with its 587,000. The flow of experiences and creativity has been, and still is, incredible. As all this content poured in, the heroic @Amelia kept annotating with ethnographic codes. As I write this, the OpenCare corpus has been enriched with 6,221 annotations, with 1,362 codes.

About six months ago, we realised something wonderful was happening. Despite the sheer size of the corpus, we could still make sense of its globality. Using Open Ethnographer and GraphRyder, we have been encoding the conversation as a network of people, their contributions, and the ethnographic codes associated to the latter. We can then use this network to produce various representation of the conversation. My favourite one is the code co-occurrence network. In it, when ethnographic codes appear together in the same contribution, they are connected by an edge. When we look at it, we are looking at a sort of map of the whole conversation: 600,000 words boil down to one single image. Like every map, the co-occurrence network discards a lot of detail: but that’s a feature, not a bug. What remains captures something important about the architecture of collective intelligence in OpenCare.

This method turns out to be scalable. Today’s network is just as comprehensible as those we made nine months ago. Yet, they encode twice the content as those older versions. Our hunch is turning out to be right.

To me, this is beautiful. I watch these networks come together, and I look at them, and I see new patterns about OpenCare. They are undiscoverable otherwise – I spend a lot of time reading and participating, yet I did not see some of them. And they are verifiable: when you go look at the actual content generating the patterns, you see they are solid.

A scalable method for ethnography! Think about the implications. You could have a new instrument for qualitative research, that can complement surveys at scale. Call it Massive Open Online Ethnography. Surveys scale very well, but are not very sophisticated methods otherwise. They suffer from serious issues. Electoral polls, for example, are surveys; and they are flimsy at best. Ethnography does not have those issues: by not asking direct questions, it does not bias informants. By working with open questions, it can capture novely and weak signals. By letting people interact, it enables them to use collective intelligence, not just add up individual opinions. But ethnography scales very poorly. Or rather it did, until now.

What if we could learn about how citizens are thinking about the state of their country analysing an open conversation online, rather than through a closed questionnaire? What if we could have a conversation-based version of Eurobarometer?

So, together with Amelia, @Jason_Vallet and @melancon we wrote a short paper explaining our method. Here it is (, We will be presenting it at the Internet Science Conference 2017 and at our own OpenVillage Festival.

You too can explore the network: GraphRyder is open to all (no login needed). We will be giving an online tutorial to whoever wants it in the next community call (info and signup).