Open Notebook Science for Ethnographic Coding: Open Codebooks

I’ve been thinking a lot about how to manage large-scale ethnographic coding, particularly in different languages. From thinking through my experiences with coding + analysis in Open Care and working with Nermine on Open Village, I want to propose we push further into the open notebook science terrain that we have been discussing since the beginning of Open Care. @Alberto and I had a discussion this morning to explore the idea further.

I think that the key to keeping the coding standardised, accessible, and transparent is to have an open codebook that all ethnographic coders commit to updating systematically and regularly. This creates 1) a way of ensuring rigorous coding and 2) keeps ethnographers working on the project accountable to one another.

Traditionally, all ethnographers coding their data keep a codebook. But this codebook is usually kept private, often on pen and paper, and doesn’t get published alongside the final product. Asking an anthropologist for her field notes or codebook, in usual disciplinary practice, would be strange. Further, as ethnography is usually a solo pursuit, the ethnographer only has to make the codebook make sense to herself. Instead, I propose a codebook that remains open throughout the coding process, attentively updated by the ethnographic coders and an object around which frequent discussions of coding choices can occur.

So what is a codebook? It is a living document that helps the ethnographer document and keep track of her coding decisions.

For each code, the following should be included in the codebook :

  1. brief description - the name of the code itself
  2. detailed description - a 1- 3 sentence description of the code’s qualities or properties
  3. inclusion criteria - conditions that merit the code
  4. exclusion criteria - exceptions or particular instances that do not merit the code
  5. typical exemplars - a few examples of data that best represent the code
  6. atypical exemplars - extreme or special examples of data that still represent the code
  7. "close, but no" - data examples that could mistakenly be assigned this particular code
  8. similar codes - if this code doesn’t fit, what other codes might be useful instead?

(5 and 6 are useful but since OE can auto-list examples of a code, may not be entirely necessary if we need to keep it more simple. At least 5 may not be.)

For our project, we can add:

  1. This code in other languages (linked)

Codebooks also include memos, which are brief descriptions capturing the ethnographer’s thought process in using the code. As Glaser (1978: 83) puts it:

‘(A memo is) the theorizing write-up of ideas about codes and their relationships as they strike the analyst while coding… it can be a sentence, a paragraph or a few pages… it exhausts the analyst’s momentary ideation based on data with perhaps a little conceptual elaboration’

Memos help the ethnographer keep track of her thoughts during the coding process. They are immensely useful in later stages of qualitative analysis when trying to theorize about the data, like when we will be writing up reports at the conclusion of the project. They help the ethnographer:

  • remember connections she made off the cuff,
  • formulate alternative hypotheses to those she made before,
  • propose new codes or question existing ones (especially in cases when not confident enough to want to make a change then and there),
  • integrate thoughts notated in previous memos or field notes,
  • help formulate or articulate concepts that are not yet fully-formed enough in her mind
  • help her future self or other researchers make sense of her choices.

The last point is crucial for our purposes — keeping these kinds of memos in the codebook will help immensely when trying to undertake collective ethnographic coding.

Examples of memos:
memo 1: used ship speed since Luke was talking about outrunning other ship in Millennium Falcon
memo 2 : changed ship speed to insufficient ship speed to be more precise since there were negative complaints about the Millennium Falcon’s speed.
memo 3: I think it’s possible that Han is trying to establish dominance through sarcasm and bravado, but is actually intimidated by Luke’s undetermined powers. Luke seems oblivious to this dynamic but concerned with the group’s survival.

The Open Ethnographer course goes into a lot more detail on ethnographic coding, so do let me know if this doesn’t make enough sense and I can add information to explain further!

Looking forward to continuing the experiment :slight_smile:


Wonderful @amelia!

This will be so important for our multilingual and multimedia coding coming up with POPREBEL so thanks for kicking the thinking off.

Looking forward to sailing through this further with you tooo

Thanks, @amelia, very useful as always! I have three remarks.

1. Extending the codes data structure to support codebooks in OpenEthnographer

This seems not to be a problem. Right now, a code in OE looks like this:

    id: 13,
    name: "accessible laboratories",
    description: null,
    creator_id: 3323,
    created_at: "2017-09-05T16:09:52.870Z",
    updated_at: "2017-09-05T16:09:52.870Z",
    ancestry: null

Codes are stored in their own table in the Discourse database. Adding fields should be very simple – we already have a description field (corresponds to your long description), that was hardly ever used.

More problems arise when codes are modified. In your example, the memos are added with time, and the second one even refers to a change in the code’s denomination. This looks like a revision history to me. Might it make sense to “wikify” codes? Should we, like in wikis, allow people other than the author of a code to modify it? Should we stick to free text fields, or even just one that gets edited over time?

On the side: the second memo might also refer to a change not in the code, but in the annotation; the ship speed code lived on and was used elsewhere; but in the post where Luke is talking about distancing the other ship it was changed to insufficient ship speed. Of course the ethnographer knows this, but here we assume that the codebook is going to be read by people other han her, like me. Open codebook ethnography is meant resolve these ambiguities in the interests of collaboration.

2. The codebook

Should the codebook be itself a database object? Once upon a time, @matthias had proposed something called a “collection”, which would be a set of codes used for the purpose of one study. Then, I think, he changed his mind. You could also define a codebook indirectly, by running a database query like this:

give me all codes that
      are contained in posts bearing the tag: "#ethno-poprebel",
      and whose authors are: ("amelia" and "ivan")

This is elegant, but it has two problems:

  1. Not clear how the codebook tab would be created in the user interface, because maybe I want to see Amelia’s codes but not Ivan’s. You would have to have a kind of wizard to generate the codebook. There is a hint of this with the “toggle view other people’s codes” right now.

  2. I may want to include some, but not all, of Amelia’s posts in my own study. This would call for a codebook to be indeed associated to a study. Then, codes and content can be associated to it. This is a much more substantial modification to OE, either way.

3. Network-based similarity between codes

Your point 7 made me think of a neat trick we might use to compute codes similarity. Just think of the semantic network as an n-dimensional grid of length 1, with the connection to each code mapped onto a dimension. Each code’s distance from every other code is either zero (not connected) or one (connected). Taking any two codes, you can then compute their cosine similarity. This works best with highly connected codes in large networks, though. Also, it might be more appropriate for a data analysis phase than for the coding phase.

I love your example based on Star Wars! :smile:

Changed my mind because filtering by Discourse tag (such as #ethno-poprebel) to define a study is more flexible. It allows code re-use in other studies, and partially overlapping studies.

No, wait: code reuse and partially overlapping studies are supported by the collection idea. Codes, annotations and posts can be assigned to multiple collections. At least, this is how I understood it.

We can do most of it with Discourse tags. Except one thing: the way we are now, we have to include all codes and annotations in the content bearing that tag. It is possible (at least conceptually) to filter by author of the codes/annotations, but not to cherrypick them.

Anyway, POPREBEL demands that we overhaul OE to support multi-lingual coding. We will also add other features as needed. It’s not like we don’t have the money.

If you like the Star Wars example, you’ll love the OE course :smile:


I like the collaborative nature of the open codebook, if the ethnographers would go for it.
In a wiki often someone can go in and change something someone else wrote.
How would the inevitable disputes get handled?

Open codebooks are shared within research groups. Usual rules apply. Maybe there is a lead researcher, and others defer. Maybe it’s only two people, and they can solve things on a case-by-case basis.

But you have a point. We need to pay some attention to people importing other researchers’ codes without respecting the codebooks. Perhaps “import” would be a fork…