Making Open Ethnographer code hierarchies useful

This is a discussion to find the right way to utilize Open Ethnographer code hierarchies. After a decision is found, it would then find its way into a new issue on Github.

These should also include a guideline for what is a good code hierarchy, because this is a bit of a mess right now. And the code hierarchy feature is not used much at all, but is potentially useful to group codes together (also for your co-occurence analyses, allowing a more coarse level of analysis).

Since changing the code hierarchy does not change what is coded how, I propose that the software should allow anyone to edit the code hierarchy. Also removing the current restriction that parent and child codes have to be by the same author. This would allow staff users to “tidy up” ethnographic coding work long after it has been done, and re-use codings for analysis by placing them into a different hierarchy (perhaps temporarily). It would also allow ethnographers to effectively re-use work of prior ethnographers by grouping them below own, currently used codes like this:

own code
  ├── code by other ethnographer #1
  ├── code by other ethnographer #2
  ├── ……

We would however not have “parallel hierarchies” as once proposed, because that seems like a total mess in UI terms and an overcomplication in the database, given ethnographers are not using the hierarchy feature much at all so far. However then, reproducing research results is difficult, as ones code hierarchy might be re-arranged by ethnographers who come later …

Interesting thoughts.

I need to understand how to operate the hierarchy. Our wiki does not cover this, and I cannot find a way to do it in the Discourse environment. Can you help me, @matthias?

There are three main functions in code management:

  1. merge two codes into one
  2. split one code into two
  3. rename a code

2 is done by creating a new code and then going through one’s annotations with the old code, assigning the new one when appropriate. 3 is simply Edit. 1 is intrinsically a database operation.


We think of a parent code as a “soft merge”. Original information is preserved, but now two or more codes have the same parent (for example “hardware hacking” and “biohacking” can have “diy culture” as a parent). The issue there is telling Graphryder which level it should build the network at!

When creating or editing a code in the backend, you can select one of your own codes as a parent code in a dropdown. That’s all currently.

Yes, that’s what I proposed for cases where the two codes do not belong to the same ethnographer.

For its default way of operating, nothing would change, as it counts only those annotations to which a code is being assigned directly (just as right now). That way, the hierarchy provides simply a navigation structure for codes.

For the new “soft merge” feature, we could have a checkbox “aggregate code” in the code creation form. If checked, the annotations it will return will include those of all descendant codes, and Graphryder can exclude its descendant codes from the network. Also, maybe it should be impossible to use an aggregate code directly on text, but not sure about that.

As far as GraphRyder is concerned, the best is not to make decision on how codes should be used to compute a co-occurence structure, but simply allow to export the whole dataset (users / posts / topics / codes) and the code hierarchy. This would allow GraphRyder to run analysis based on varying level of code granularity.

Seems reasonable. This also implies that a separation into normal and aggregating codes is probably the way to go – with the latter not selectable for coding in Open Ethnographer. Because that avoid the issue when running an analysis on (say) the first level of aggregation to decide what to do with annotations directly using codes that group together codes on a second level of aggregation.

Directly coding with a code that is meant for aggregation of other codes would have to be substituted with a code on the lower level meant for “other” / “misc” items related to that higher-level concept. Everything else is misusing abstraction levels (“category mistake”).

Hmm … let’s take another perspective on the issue. Although higher level codes in the hierarchy indeed “aggregate” lower level codes, they may carry semantics – where higher level codes are generalisation of lower level, more specific, ones. In which case, it makes sense to use them as part of the analysis. Moreover, during the analysis, it may be useful to just substitue a code for a higher level, more general, code. Playing with a level of details on codes might prove useful when navigating a large corpus.
I form this comment based on my experience working with people on the media where documents are index using entries from a large, hierarchical, thesaurus (categories, sub-categories, down to fine grain keywords). Thesaurus entries are quite similar to ER codes; thesaurus entries precisely form a generalisation/specialisation indexing structure. Having this semantics allows annotators to use more specific codes knowing they also encode more general notions.
That being said, ethnographic coding may comply with other principles, and there might be reasons why ER would not want to use such a generalisation/specialisation structure indexing structure, but force lower level codes.

1 Like