Codebook as a wiki: first iteration

Written after the first periodic meeting of the ethnography team. It was dedicated to reviewing the very first iteration of the codebook for the NGI Forward study.

Ladies, color me impressed. It was fascinating to watch you work. In my interpretation, what’s happening is:

  1. The rationale for coding choices becomes explicit. Accountability of the research work increases.
  2. You are now building small networks of codes in your heads, and using them as information structures to store ethnographic evidence in. The lovely bit is that these networks are open: they encode the possibility that parts of the corpus you have not yet coded might connect to them, thereby generating additional meaning by triangulation. Example: you said that the code regulating 5G should be split into regulation co-occuring with 5G, because other parts of the conversation might evoke regulating something else, for example recommendation algorithms. If this is indeed the case, you are able to see the discussion on regulation on a broader level, as regulation will then link to both 5G and with recommendation algorithms in the semantic network. At a high level of interpretation, this ends up allowing the space for regulation, as the community sees it, to emerge.
  3. The Codebook is now on a GDoc. By doing this shoot-from-the-hip prototyping you are zeroing in on functionalities that future iterations of Open Ethnographer needs. The fact that you use them is a strong indication that you need them, as in really need them.

And speaking of this: @matthias, it seems the ethno team uses comments a lot. They took the trouble of rebuilding the whole codebook on GDocs, and it looks like this:


I think, however, that the codebook should be automatically generated by OE. The codebook proper is a list of codes (that are all in the database) followed by their respective descriptions (also in the database). Can we make them commentable? Or maybe just generate codebooks that can be exported?

@amelia @Leonie @CCS


How about a button Discuss this code on a code’s page that would create a new Discourse topic and link it to the code? Then discussion can happen on Discourse as usual, and people would be notified as usual. (We don’t want to re-implement discussions, as it would not become as good as the Discourse discussions.)


Hi Alberto, great to e-meet you today. And very fun to read your reflections and encouragements of our work.

Always here to chat more about how we can translate some of our current mechanisms to fit within the OE platform! Cheers, CCS

1 Like

This sounds like a great idea. Let’s hammer it into a proper metadata strategy for the data of SSNA studies.

This ties into this issue: I hope to convince you, @matthias, that we do need a database entity to refer to a whole SSNA study. The reason is metadata, which are essential to understanding and reusing the data. Some metadata can be automatically generated from the data: for example, the duration of the conversation can be inferred from the timestamps of the posts; the duration of the coding from the timestamps of the annotations; the number of researchers involved from the names of the annotations’ authors. But others cannot, specifically methodological choices informing the coding (“we use a grounded theory approach…”, “we chose to use names of individual technologies, and even of specific applications, as in vivo codes. This is because the study’s ultimate purpose is to contribute to shaping technology policy…”.

I find it useful to think about these things in terms of exporting a whole project for safekeeping on a repository such as Zenodo. What would a complete export look like? The ethnography proper would have:

  • pseudonymized topics and posts
  • annotations thereof, which include the associated codes
  • the codebook, automatically generated from the annotations and codes. It includes the discussion part as suggested by Matt.
  • methodological notes.

@amelia, @ccs, @Leonie, am I missing anything?

For the SSN proper, we could take two roads.

The first is this: there is no need to export anything, because the SSN adds no information to the coded data. It is simply a network representation thereof. Interested researchers can rebuild it from the ethno data.

Alternatively, we could add to our data either the Neo4j file powering Graphryder (after pseudonymization), or a docker file containing the whole Graphryder shebang.

I think I am going to fork the final part of this discussion and move it to #ioh:workspace, so other partners in the consortium are more exposed to it.

Hi Alberto,

Thanks for starting this conversation. I am not (yet) familiar enough with the specific functioning of OE to comment on the suggestions. But I am very interested in the process of capturing and analysing the meta-data. I was wondering how meta-data is currently collected and used? And how it plays into the creation of the semantic network, if at all?



1 Like

Can be done. There could be another menu item “Studies” in the Open Ethnographer interface, at the top. So: “Overview, Studies, Codes, Topics, Annotations, …”. A study record would be quite simple:

  1. Field for the name of a Discourse tag that defines the study content.

  2. Field for a link to a wiki with methodological notes.

  3. Export button.

“Study” would then also be available as a filter in the Codes, Topics and Annotations views of Open Ethnographer. Which sounds useful.

If that sounds reasonable, make a Github issue for this feature request. We’ll see when it can be implemented (related to available budget).

I’m not too much in favor of this feature as I’d rather like to slowly transform Open Ethnographer into a generic annotation plugin for Discourse (“Annotator.js for Discourse”). That would make it useful beyond the narrow scope of online ethnography. “Study” is rather only useful for online ethnography, but this screen could simply be hidden in the more generic annotation solution that I have in mind …

1 Like

Corinne, AFAIK your field (like most of the humanities) has a very casual attitude with respect to documenting research. Exact reproducibility in a quantitative sense is obviously out of the question, so many scholars decide they have no duty to make the reasoning leading to their papers transparent. So, we have no best practice to fall on. The main epistemological move behind SSNA is to treat conversational exchanges as data. That makes it natural to adopt the culture of data stewardship that comes from different communities, in my case the open data community.

When we documented OpenCare, it worked like this.

  1. At the beginning of the project, we wrote a data management plan. This type of document is now a requirement for EU-funded research.
  2. Graphryder fetches the data from the database via API. So, we documented the API itself. This is our main metadata repository, in that database entity are exemplified, witjh the indication of the fields they contain and an explanation thereof (unless the name of the field is itself explanatory).
  3. We then exported the data in JSON format, uploaded them on Zenodo. The Zenodo record contains links to the API documentation:

Do you think it is adequate?

Hi Alberto,

Thank you for explaining. That makes a lot of sense and I agree it is important to document the research process. In our case, we have started to do that by writing down our coding memos in which we explain why and how we make decisions, as well as ask each other questions. As @amelia mentioned we plan to publish these regularly on the platform.

One of the things I was struck by in your initial email was the capture of data with regards to time spent on the platform (“for example, the duration of the conversation can be inferred from the timestamps of the posts; the duration of the coding from the timestamps of the annotations; the number of researchers involved from the names of the annotations’ authors”).

I was wondering to what extend there had been an epistemic-level conversation about what this kind of data does and does not capture? Because time-stamps are only partial data, in the sense that a 2 hour coding session on one thread could tell you: 1. it was long, 2. it was complicated, or 3. it was interrupted by, the need to read up, (grab a coffee), go back to some literature, 4. some other reason entirely etc. etc.

I think this might be an interesting conversation to circle back to once we get to a stage of (meta) analysis of such data.

anyways, just my .2 cents! Have a wonderful Sunday!



1 Like

None at all. It is my experience with data analysis that there are rapidly decreasing returns to model sophistication. And anyway, we have no theory to interpret exact information on the length of writing a contribution, even if there were a cheap way to obtain this information. I would simply use the timestamps to compute the time period in which the conversation took place, and use it as a descriptor of the corpus, not of each individual contribution. “The conversation took place from timestamp(earliest post) to timestamp(latest post).”

1 Like

Got it! That makes sense. I was interested because Mia and I have a good friend who does great PhD research on the dangers of conflating timestamps with credibility (in her case she researches harassment reporting apps in the context of US campus rape culture). So my question was more aimed at avoiding that conflation pitfall rather than complicating our current data analysis model. Hope that makes sense!


OK… that’s a context in which an investment in more precise modelling actually makes plenty of sense.

We do low stakes stuff (praise the gods). Whether the online conversation started in January or March 2019 does not really affect the validity of our results. So we can afford not to care too much.

1 Like

Well - the interesting thing about her research results is that such precise modelling can actually be harmful to survivors, because it moves the credibility of their personal experience and story to the timing of their timestamps. So really the argument I am trying to make is that timestamps should be taken with a grain of salt, but I guess that worry is moot in this case :slight_smile: