Ethnographic Coding Wiki

Big Picture

When we are coding, we need to think about the rigour of the coding system so that others can easily understand and use our codes and the data structure we are producing in the SSNA. This means:

  • creating codes that carry meaning, are salient and are essence-capturing when viewed on their own
  • defining everything in enough detail and documenting why you chose to use a specific code
  • creating consistent and clear categories so that other ethnographers can easily navigate a large codebook
  • thinking about how someone who has never read any of the underlying data would read and understand the code if they saw it
  • thinking about what meaning the code will carry when it co-occurs with other codes in a visual network

Coding Conventions

Spelling and Formatting

Use British spelling for English-language codes.

Use lowercase letters (unless capitalised letter has semantic meaning, as in a proper noun).

Use accented letters as normal in all languages.

Avoid compound codes

Because the SSNA detects co-occurrences, it is important that each code carries one meaning, that can then be linked to others.
For example, code homosexuality and discrimination rather than homosexuality and discrimination or homosexuality:discrimination.

As @alberto aptly puts it:

Specificity vs Generality

If the code does not carry any real meaning on its own (e.g. approaches ), it is too general.

If the code is too granular to ever be reused (e.g. hyperactive mosquitos ) then it is likely too specific.

Sometimes this requires creative thinking. If you’re trying to capture the idea that informants are expressing that Romania is more awesome than Poland, the code Romania is more awesome than Poland is likely not going to co-occur in many places (though @noemi might disagree :smiley: ) . However, we could use the code country comparison and co-code it with Romania and Poland (or Romania-Poland if we really felt strongly that we didn’t want to lose that specific country comparison as tied to the country comparison code, though see why we should hesitate to do this in the compound codes section above. We could always go back and copy the code Romania-Poland if we found it was too specific, assigning one instance of all the annotations to Romania and the other to Poland , so no harm done as long as we keep that more broadly useful country comparison there.)

How many codes to assign?

It is much easier to merge (and copy) codes than it is to fork them. As a result, aim for greater granularity and merge into a more general concept if you find upon review that the granularity is too small.

If we code everything discrimination and realise that we wish we’d coded homophobia and racism and sexism differently, we have to go back and re-read and recode all the annotations assigned to discrimination . If instead we code the other three and decide they are too granular, merging them all into the code discrimination is easy. If we decide homophobia is occurring on its own broadly enough but the other two are too granular, we can always merge racism and sexism into discrimination but leave homophobia alone. If we want all of the instances of homophobia to also be co-coded with the higher-level code discrimination , we can easily copy the code homophobia and merge that second instance into discrimination so that all its annotations are also coded with the code discrimination. This is what we use hierarchies for in the backend, to more easily keep track of such concepts, as I will return to in the next section.

Code 'that which goes without saying

Part of our job as interpretive analysts is to use our sociocultural understandings and our training to read between the lines. If community members are talking about two concepts explicitly (say remote working and e-learning ) if Covid-19 is the context from which this conversation emerges, and the community members are clearly assuming that shared context in their conversations without stating it explicitly, be sure to code it.

Culture is often termed ‘that which goes without saying’, and part of our job as ethnographers is to explicitly say it. This is especially important in the context of populism, where family values traditionalism and housing policy might be used to speak about something like homophobia in subtext.

Code only what has meaning

Back to ethnography as an interpretive method. One way not to overproliferate codes is to make sure that a code is only applied if the community member uses the concept meaningfully.

For example, if an event takes place in the United States, but the activities that happened at the event are not meaningfully connected to the fact that the event took place in the United States, in your interpretive assessment, do not apply that code. If, however, the fact that a certain activity happened in Prague (a major city) rather than a rural area, apply the code Prague . Use SSNA thinking here: a co-occurrence network around cities might differ substantially to one around a rural area (certain ideas might be more widely held and repeated in the capital city than in a rural town, for example), which we would want to capture in the SSNA. Code with intention and interpretively.

As another example, a community member might mention that they are 50 years old. You would only apply the code age if age was a meaningful frame used by the community member – if the story was about growing older and life changing, for example. But if the fact is incidental and they go on to talk about monster truck racing , do not apply the code age .

Three Tiered Invivo System

If a code is descriptive (your term for what informants are describing, or a word that is used in ordinary parlance to refer to the thing you are referring to), use unmarked text. Example: sustainability or mental health

If a code is invivo (a directly quoted word or phrase that your informants used that is unique, interesting, or salient as a concept, or does not necessarily fit the ‘normal’ use of that word) use double quotation marks. Example: "witch" , "the East", "punk"

If a code is in-between (a conceptual category used by informants that you are aggregating into a term yourself and/or that does not fit the dictionary or academic use of that term), use single quotes. Example: 'communism' or 'patriotism'


Hierarchies do not appear in the SSNA itself, but we use them to enhance our ethnographic practice. Here are some ground rules.

Every code in a hierarchy must make sense on its own. Discrimination could be the parent of homophobia or sexism , but creating a code like approaches and nesting it under discrimination is a no-go.

You can create null codes to organise codes in the backend, which will not affect the SSNA. For example, we created the code geographical location (with no annotations assigned to it, by using the “new code” function in the backend) to nest locations like Prague.

Wait to assign hierarchies / parent-child relations in the backend until we do this together in calls where we analyse the SSNA visualisation alongside this practice, since our hierarchical relationships affect each other. Instead, create categories in your own codebook. We will then discuss them and apply them in the backend together.

Use hierarchies to toggle specificity and generality.
Let’s return to an example we used above. If we want all of the instances of homophobia to also be co-coded with the higher-level code discrimination , we can easily copy the code homophobia and merge that second instance into discrimination so that all its annotations are also coded with the code discrimination. We might do this if we decide that discrimination as a code by itself would co-occur meaningfully with other codes in the SSNA.

However, we don’t want this to happen automatically, because excessively higher-level codes can end up dominating the graph too much, and may not carry enough meaning on their own to be represented (like geographical location, which is a useful organising category but not a very useful SSNA category). We use hierarchies in the backend for different purposes, and not all are worth representing in the SSNA.

Codebook Structure

The codebook should have two tabs.

Tab 1

A list of the codes, ordered either alphabetically or linearly (in the order that you coded them) with definitions.

A minimum viable codebook entry includes:

  • Column 1: Code name in source language
  • Column 2: Code name in English
  • Column 3: Definition of the code

Additional information can be added:

  • Column 4: Memos about the code (questions for other ethnographers, notes to self)
  • Column 5: Links to related codes (in own codebook or in other ethnographers’ codebooks, to suggest merging in future)

Tab 2

A list of categories and/or hierarchies that organises the codebook. Important for 2 reasons:

  1. So that other ethnographers can easily find similar codes in your codebook to ones in theirs, to suggest merging or to use your code to apply to their dataset.
  2. So that when we come together to create hierarchies in the backend every month, the task is more streamlined.

Document Everything

Define your codes immediately.

Define every code. I mean everything. Even if it seems self-evident. See above on saying “what goes without saying”. This applies to our own frames as well – what seems self-evident to one of us will not be self-evident to another one of us.

Assign both English and source language translations in the backend, so the codes are connected.

If you’re not sure about a term or code, note it down and explain why so that others can help you hone it / you can return and refine it

Create categories in your own codebook as often as possible to help you structure and streamline your codes. I recommend creating these as you code, or at least frequently, to avoid having to do this in a big batch. Doing so makes you less likely to assign different codes to the same concept and have to merge later, since you can see your existing codes more clearly.

Interacting with Other Ethnographers’ Codebooks

Review other ethnographers’ codebooks at minimum monthly.

Check if:

  • the code name accurately expresses the definition (is everything in the definition captured by the term used? Is there a more salient or accurate term for what the ethnographer is trying to express in the definition?)
  • the code is too general to carry meaning on its own
  • the code should be broken up into two separate codes
  • the definition/concept is already expressed by another code used in the codebook
  • the ethnographer has asked any questions that you can answer
  • the hierarchies and categories they use in Tab 2 make sense

Leave comments attached to particular cells using the comment function on Google Sheets, so that the ethnographer receives a notification when comments have been made

Note any related codes you have in the “related codes” tab and make a hyperlink to the cell in your own codebook where the related code is.

Merging codes. If you think your code means the same thing as someone else’s (or close enough that you should seek to align them), make a note of it in the related codes tab. Once you discuss the merge with the other ethnographer, merge the codes. Hyperlink the merged code in English to indicate that it is shared across codebooks (on Tab 1 Column 2). Remember to check with the other ethnographer if you want to change the code or its definition.


Still to do:

Add pictures to this.

Think about what other information is needed – @Jirka_Kocian @Richard
@Jan @wojt @SZdenek

This is adaptable to NGI @Leonie @katejsim @CCS – I’ll probably end up putting it in the main workspace and then just have little project specific sections for anything that’s different

1 Like

Maginificent job, @amelia! Thank you soooo much!

1 Like

This is super super useful!

1 Like

We should be merging this with last year’s Coding Standards for Open Ethnographer, and generalize to any OE project.

1 Like

Thinking in network when coding: mapping coding to network structure

Basic concepts

When you code with Open Ethnographer, you are implicitly arranging codes in a graph. The basic graph structure is something like this:


Entities in the graph are:

  1. authors (participants)
  2. their posts
  3. ethnographic annotations
  4. ethnographic codes.

The types of relationships involved are:

  1. Authors write posts.
  2. Posts may reply to other posts.
  3. Annotations annotate posts.
  4. Annotations invoke codes.

These relationships are fundamental: we cannot deduce them from the data, only create them when authors write posts, or ethnographers code them. But there are other types of relationships that we can deduce from the fundamental ones, through a technique called projection. The most important ones are:

  1. A social relationship between authors: author Alice is talking to (or engaging with) author Bob when Alice writes a post that is a reply to a post written by Bob.
  2. A semantic relationship between codes: code C1 co-occurs with C2 when there exist two annotations A1 and A2 where A1 invokes C1, A2 invokes C2, and A1 and A2 annotate the same post.

In the case above, these relationships do not appear. There is only one author, Alice, and only one code, C1. But imagine now the ethnographers creates a second annotation on the same post, and invokes a new code C2. Like this:

Now, the codes co-occurrence network shows that C1 and C2 co-occur once, because they are both invoked by annotations to post P1. The number of co-occurrences is represented by the weight of the co-occurrences edge. The interaction network shows only Alice, interacting with no one.

Suppose now that Alice’s post was in fact a reply to a post written by Bob. The situation is now this:

With no more annotations, the codes co-occurrence network is unchanged. But the interaction network now shows a link from Alice to Bob, symbolizing engagement. This edge, too, is weighted: the more replies Alice writes to Bob, the heavier the edge.

Every addition to the conversation database (authors writing posts, ethnographers adding annotations and codes) is encoded this way. So, you can think of coding as drawing networks. With every annotation, researchers using Open Ethnographer are adding nodes and edges to the conversation’s semantic social network, more specifically to its semantic part, the codes co-occurrence network.

Multiple annotations on a single post induce a clique

When a researcher adds an annotation to a post in OpenEthnographer, the code invoked by it will by construction co-occur with all the codes invoked by the other annotations on the same post. So, any post whose annotations invoke two or more codes gives rise to a clique of codes – a completely connected network, or part thereof. The number of edges in a clique depends on the number of nodes. In an undirected network like the codes co-occurrence network:

  • With 2 annotations, you get 1 co-occurrence edge.
  • With 3 annotations, you get 3 co-occurrence edges.
  • With 4 annotations, you get 3 co-occurrence edges.
  • With n annotations you get (n * (n - 1)) / 2 edges.

If you visualize the full co-occurrences network (including edges of weight 1), rich posts are easy to spot as very dense cliques, often connected to the rest of the graph by only one or few codes:

Interpreting repeated co-occurrences

When doing SSNA, i.e. the analysis of semantic social networks, you would not attribute a great deal of importance co-occurrence edges of weight 1. There are two reasons for this, one conceptual and one network-structural.

The conceptual reason is this. SSNA is a quest for collective intelligence. It aims to capture how a group in conversation, not a single individual or a collection thereof, see something. By construction, an edge of weight 1 in the codes co-occurrence graph means that the two codes in question occurred together in the same post only once; posts can only have one author, so only one individual has made that association explicitly, only once. This does not qualify as collective intelligence. When the same co-occurrence repeats itself across multiple posts, it is likely to encode an association supported by the collective. We treat repeated co-occurrence as a signature of collective intelligence.

The network-structural reason is that a rich post might well have 20 annotations with 20 different codes. This means 190 edges. The number of edges in the graph can easily become dominated by a few rich posts. There is no elegant solution for this.

Some network scientists dealing with interconnected cliques like to assume that all cliques (not all edges) have the same weight, equal to 1. They, then, rescale the weight of the edges by the inverse of the number of edges therein. In our case, this would mean assuming each post has one “vote” to spend. If a post is annotated invoking four codes, each of its 4 * 3 / 2 = 6 edges would have weight 1/6. A post with only two codes invoked would give rise to a single edge of weight 1, and so on.

We do not consider this to be a good solution for online ethnography. A large number of annotations on a post tends to mean that that post is indeed very rich in meaning (and often longer than average). It is by no means clear that the connections across these codes would be of less value than those stemming from posts with only 2 or 3 codes.

Instead, we filter out all one-off connections, and consider only the co-occurrences that appear at least twice in the corpus. This:

  • Anchors more firmly our claim that the codes co-occurrence network has something to do with collective intelligence.
  • Gets rid of the cliques.
  • Simplifies dramatically the graph: in our studies so far one-off co-occurrences make up about 90% of all co-occurrences.

Done. Our codebook looks different in NGI but I’d like all future codebooks to take this form, so this is generalisable. Where do you recommend moving it (it is currently in Wellbeing in Europe).

Misunderstanding maybe. I think this Ethnographic Coding Wiki should merge with Coding Standards for Open Ethnographer, and both of them should be sections of the 📗 Open Ethnographer Manual. So, everything should be moved out of the POPREBEL cat, and live in the Collaboration cat instead.

I already added to this wiki what was relevant from the latter – most is now obsolete.

This makes sense in terms of having everything in one place, but it’s very overwhelming informationally. I’m OK to do it but need to go through the existing manual and make the language more accessible if it’s aimed at new ethnographers, and re-arrange the information a bit, if that’s OK.

Yeah, no rush. Let’s first put together the info, then we will re-arrange it.

1 Like

I deleted the previous post, because I realized I need to go through Amelia’s entire codebook anyway. I will create a list of all terms related to emotions in a separate document for now, so we can thoroughly discuss this issue. It is very central to our work, as populism is a very emotional phenomenon.

There is a categories tab in my codebook in which I have a section just for emotions.

The first tab is just a list of codes in the order that I coded them with definitions attached. The second tab is those codes organised into categories. See wiki above which lays this structure out (your codebook should take this form too).

1 Like

Got it. See it now. Let’s talk about it, as you have there a lot of “thingies” that I would not classify as emotions. Here is my analysis of Amelia’s and Wojtek/Jan codebooks. My goal was to fish out all “emotional” codes. I tried to stay close to the standard (more or less) classification of emotions (see some explanations below).

Emotions and related codes (let’s discuss both lists):
Jan (started July 3, 2020)

Amelia’s codes

42 – intense feelings
48 – feeling at home
54 – emotional distance
69 – happiness
78 – emotional distress
103 – emotional distress
105 – loneliness (this is a state, not emotion – my therapist friends remind me though)
114 – empathy
182 – hate
183 – fear
191 – anxiety
266 – eliciting emotions
269 – feeling of safety (Is it an emotion or rather attitude/personal assessment/view?)
270 – feeling unsafe
274 – fear (repeat of 183)
276 – unfunded fear
277 – sense of danger (Amelia: potentially merge with 270 “feeling unsafe”)
295 – feeling of missing out (“FOMO”)
319 – emotional attachment
340 – grief
341 – empathy (repeat of 114)

Additional remarks

168 – rituals – refine description? Not just repetitive (as in Goffman) but set apart from the everyday (as in van Gennep and Turner)
302 – rituals (again)

Wojtek and Jan’s codes

12 – justified anger [Should it be just “anger?]
16 – eversion (to flaunting homosexuality) [in Plutchik’s wheel (PW) below – disgust, I suppose].
35 – despondency
36 – disappointment
50 – gratitude
51 – (growing) empathy [BUT: It is a method of obtaining knowledge, not emotion]
55 – hatred [contempt in Plutchik’s wheel?]
76 – liking (job) [in Plutchik: admiration + trust = acceptance]
77 – little happiness [Plutchik: joy]
115 – positive sentiment [Plutchik: serenity + interest = optimism]
157 – toxic masculinity (Is there any emotion here or just/mostly listing a specific form of masculinity?)

In my studies on emotions, I have come across the diagram I copy below. Also, I have the full e-version of a handbook of psychological anthropology, written by one of my favourite teachers at Columbia, Chuck Lindholm, Culture and Identity. The History, Theory and Practice of Psychological Anthropology. I will gladly share, should you be interested. I take the diagram below (Plutchik’s wheel) from that book (2007:275)

Screen Shot 2020-07-03 at 3.21.50 PM

See also: (and another version of this diagram below):

Screen Shot 2020-07-03 at 4.17.19 PM

Summary of the section on emotions (Lindholm 2007:291)

Emotions are, as any number of theorists have noted, extremely difficult to study. Compelling, ambiguous, and subjective, they have served in the West as the epitome of the irrational. But they are also perhaps the most powerful motivating factors in our lives, and so have been the object of intellectual discourse for a very long time indeed. Many theories of emotion have been proposed, some of them concerned primarily with evaluation, others with typology. Emotions have also been the object of rational control, and have been apostrophized as the seat of true humanity. They have been seen as internally generated and as completely reflective of context.

Despite controversy, it is now recognized that emotions do have distinctive and universal physiological content, that they serve as a biological system of motivation. It is also recognized that some emotions are fundamental and powerful, while others are peripheral and less gripping. But exactly what these fundamental emotions are remains controversial, though there seems to be general agreement that fear, anger, sadness, and happiness ought to be included (emphasis - JK).

Some anthropologists have been slow to accept these findings and have tended instead to argue that emotions are completely culturally constructed. The Balinese, for example, have been said to have no feelings at all, while the Inuit feel no anger and the Ilongot can throw anger away. Yet restudies have discovered that these are overstatements that do not do justice to the complexity of the emotional experience of these peoples. It is clear, though, that emotion is regulated differently in various cultures, with some strongly favoring masking or even simulating emotions for pragmatic purposes, while others alter their expression of feeling in the belief that this will change their inner reality. These differences are related to differences in social structure: Tight yet competitive structures where people have a known status and role are more likely to manipulate or mask feeling; looser and more ambiguous systems lead members to define themselves through the public expression of appropriate emotional states.

But if different societies have different concepts of emotion, does this mean that feelings themselves differ? Many anthropologists now argue that some cultures do not see emotion in Western fashion as internal, personal, and powerful. Instead, for them, emotion is public, relational, and controllable—an embodied form of thought. However, there are contradictions in this argument—not the least of which is the difficulty of accounting for the motivations of others in ways that do not simply reduce all interactions to quests for power. Also, conflating thought with emotion has the unintended consequence of reducing the autonomy of emotion. Feeling becomes another mode of discourse and thus all but disappears.

What is needed instead is a more dialectical view of emotion and culture as realms of being that are intertwined and mutually interrelated, but do not wholly overlap. Some emotions may be hypercognized, others may be hardly spoken of at all, yet the latter do not vanish and may appear in symbolic or somatic forms. Certainly, culturally specific emotional forms do indeed exist, as blends or as modifications of deeper drives. Nonetheless, for the anthropological study of emotion to go forward, it must be admitted that all human beings share a common heritage. To argue that fundamental emotional impulses exist and are engaged in a dialectic with cultural constraints does not undermine anthropological analysis. Instead, this premise provides a better basis for comparative work and, perhaps more importantly, gives a basis for the humane anthropological claim that others are not so different from ourselves. They too are driven by contradictory desires for attachment and for autonomy; they too are subject to fears, anxieties, and grief; they too are transported by love and communion.

1 Like

This dialectical view of emotion and culture is the one I choose to take myself, and suits our approach very well, I think. Glad we are on the same page and thank you for taking the time to write out this reflection! I look forward to having this conversation as one of our agenda items on the call – it will definitely help us make the hierarchies together, since @Jirka_Kocian also has a “Feelings and Emotions” category. It is certainly emerging as one of the important topics in our analysis thus far!

In analysis of emotions, we also see a dialectical relationship between feeling and response (with a variable temporal space between, in which the feeling can be either analysed or not). How we learn to interpret and handle those emotional responses is very socioculturally mediated – a good example of this comes from anthropological studies of masculinity, in which some sociocultural contexts a man is meant to process and display emotion in very different ways from other contexts, and how he does this carries different symbolic and social meaning as well. How we feel emotional impulses may be highly similar and often in large part biologically based (especially more “creature-like” emotions emerging from evolutionary response mechanisms) but how we process, interpret, and externalise these emotions (and how we are taught to do so) remains an excellent object of anthropological study. (Political science, of course, also has a lot to offer by way of emotional analysis in some of the realms we are studying – capturing hearts and minds, etc)

This, by the way, is why I think all anthropological departments should be four fields :wink: You need the biological anthropology (and linguistic anthropology) to understand the sociocultural anthropology properly!