Open ethno data: towards a new Masters of Network?

Today I had a bit of time, and re-exported the OpenCare dataset to Zenodo using the new export code. This was necessary, since I had found out that the Zenodo data previously published by @melancon and @jason_vallet had not been pseudonymized.

But it has an additional advantage: now we have published three datasets, each relating to an ethno project (OpenCare, POPREBEL, NGI Forward) with exactly the same structure. This makes it in principle possible, and even easy, to treat it as one single dataset. A large one: we are looking at some 8K posts by about 700 participants, with close to 10K annotations. I think it is safe to say there has never been an open ethnographic dataset of comparable size.

Should we do something with it? Should we try to go deeper into abstraction? By this I mean analysis of the structure, rather than the semantics. It would be about trying to recognize patterns in how collective intelligence works in large conversations. Example questions:

  • Is a post more likely to be annotated if it is in a topic with many replies? Sexier formulation: does interaction lead to more interesting insights for the analysts?
  • Is a post more likely to be annotated depending on the social network metrics of its author (eg. centrality)?
  • Can we estimate the likelihood that a post will be found interesting by ethnographers? Application: when there is a lot of content, can we algorythmically build a queue with the most promising posts at the top?

And so on.

It could be a Masters of Networks at some point – provided we are allowed to travel.

Ping@amelia, @ccs, @leonie, @Jirka_Kocian, @jan, @Wojt, @rebecca, @sander, @melancon, @brenoust, @bpinaud


I train our ethnographers to go where the community points us, so we almost always code higher comment posts first (exception – when they are planning posts authored by community managers and not ‘content’ posts).

This would be interesting, to see what our metric is beyond the question of number of comments (e.g. is there something common to what we consider ‘content-ful’)

I hypothesise that it’s not only/less about the social network metrics of its author and more about the diversity of the people commenting — I usually end up coding posts/finding them interesting when there are at least 3 people actively talking.

I’d definitely be up for comparison – I do think we should get slightly further along on POPREBEL, since it’s not a great example just yet. When we have greater parity between English, Czech, and Polish (which should happen in the next month) it’ll be worth doing.

They also give us 3 different looks in terms of ethnographers, which could be interesting (especially since I have intimate knowledge of all three contexts and could help contextualise any confounding variables). The first is a lone ethnographer (Open Care). The second (POPREBEL) is one lone ethnographer and two pairs of ethnographers working across 3 languages (maybe we should add another ethnographer to English just for the sake of having 2 2 and 2…). The final (NGI) is 3 ethnographers working on a single language.

1 Like

Ahh, but a post could be uninteresting, and in this case it will still not be annotated.

This is a testable hypothesis. Yay for conversation-as-data. :slight_smile:


Ah ah! This is an extremely interesting topic. Comparing different networks in quest for similarities or distinctive properties (for each of them) is a question that often pops up – colleagues investigating human trafficking networks are looking at that same issue to figure out whether different network operate based on the same “logic” (see here and here e.g.).

There are several questions hidden in there, some of which relate to the way people interact and reply to one another, and how topics develop; and others on how content is annotated.

I am definitely in for a hackathon. Is there a way we could have such an event although we would not be able to get altogether in a same room. Couldn’t we have a virtual get together between groups located in Bx, Brussels, X, Y, Z, etc. (and rather meet in a Zoom or Google meet room)?

1 Like

I think this would be possible, but not an equilibrium.

In my experience of hackathons, the actual work is anyway done with everyone looking at their screen. The motivation for devoting 8 hours solid on a Saturday comes from the fact that people are getting together physically, and hanging out during extended breaks. It takes a very disciplined person indeed to do that on Zoom.

When they are not participating in hackathons, hackers are, well, hacking. All tools are asynchronous and very efficient, interaction is minimal, but there is very little control on what any open source project will get done, and by when.

It would be quite an achievement to redesign the hackathon format so that it can be faster, more social and more interactive than normal open source projects, while still maintaining some of the efficiency of the latter. I imagine such a hackathon lasting 4-5 days, with frequent team check-ins and quite a visible, coherent leadership so that participants can enjoy the feeling of knowing why they are doing what they are doing.

Would it be an option to address it to BOTH ethno-anthro and compsci people? @melancon, @amelia, @akmunk, @jan: would you be open to experiment with something like this?

1 Like


Still interested, just as @brenoust and @bpinaud I guess. Reserving 4-5 days in a row (although not full days) might be more problematic. Designing the hackathon, so participants commit to do tasks and deliverables might be a solution. I understand this somehow throws away the spontaneity we appreciate in such gatherings … but just for once …


1 Like