Today I had a bit of time, and re-exported the OpenCare dataset to Zenodo using the new export code. This was necessary, since I had found out that the Zenodo data previously published by @melancon and @jason_vallet had not been pseudonymized.
But it has an additional advantage: now we have published three datasets, each relating to an ethno project (OpenCare, POPREBEL, NGI Forward) with exactly the same structure. This makes it in principle possible, and even easy, to treat it as one single dataset. A large one: we are looking at some 8K posts by about 700 participants, with close to 10K annotations. I think it is safe to say there has never been an open ethnographic dataset of comparable size.
Should we do something with it? Should we try to go deeper into abstraction? By this I mean analysis of the structure, rather than the semantics. It would be about trying to recognize patterns in how collective intelligence works in large conversations. Example questions:
- Is a post more likely to be annotated if it is in a topic with many replies? Sexier formulation: does interaction lead to more interesting insights for the analysts?
- Is a post more likely to be annotated depending on the social network metrics of its author (eg. centrality)?
- Can we estimate the likelihood that a post will be found interesting by ethnographers? Application: when there is a lot of content, can we algorythmically build a queue with the most promising posts at the top?
And so on.
It could be a Masters of Networks at some point – provided we are allowed to travel.