This is the manual for Open Ethnographer, our open source, custom software application for ethnographic coding. We use it on this very forum. As usual, this manual is a wiki â update and extend it as needed.
Content
3. Coding with Open Ethnographer
4. Managing codes, annotations and settings
6. Process for a coding project
7. Best-practice conventions for coding
- 7.1. Big Picture
- 7.2. Coding Conventions
- 7.3. Codebook Structure
- 7.4. Thinking in network when coding: mapping coding to network structure
8. Archiving ethnographic data
- 8.1. Machine readable: data export and long-term archival
- 8.2. Human- and machine readable: Codebook export
9. Contributing to development
10. Getting ethnographic data onto the Edgeryders platform
11. Using the Open Ethnography platform
1. Introduction
Open Ethnographer is an open source tool for Qualitative Data Analysis (QDA), which is a way to systematize and understand long chunks of text and to make sense of it.
If you came to find the manual for a tool to annotate Discourse forum content, you also came to the right place. While âOpen Ethnographerâ is the branding how this tool is known around Edgeryders, its technical name is âDiscourse Annotatorâ.
Usually, there are four main steps in QDA:
-
Preliminary coding. A thorough reading of the content that leads to a draft list of codes, or tags, for certain portions of text.
-
Coding. A careful reading, this time assigning a âcodeâ (a âkind of hashtag without a hashâ) to a relevant portion of a given text.
-
Building categories. Group codes into categories for further analysis.
-
Analysis. The content of the categories is described with references back to the text and the original quotations.
Open Ethnographer (OE) is used for coding the content of a webpage without downloading it first. It is a simple to use and intuitive tool, and the instructions are below.
There is also another, legacy version now called âOpen Ethnographer for Drupalâ, documented here. All relevant parts of its documentation have been incorporated into this manual at hand.
2. Getting access
To be able to use Open Ethnographer, you must be a member of the annotator
Discourse user group.
To become a member of that group, visit the page of the group and click the button to apply for membership. You will be able to write a bit about why youâre applying. Then, the annotator
group owners will see your request and one of them can then approve it. The owners are currently @nica, @matthias and @alberto â if necessary, you can message them directly if you experience a problem with this process.
In addition, all administrator users of the Discourse platform automatically have access to Open Ethnographer.
3. Coding with Open Ethnographer
The basic steps are the same independent of the type of material you will be coding:
-
Log in to edgeryders.eu as normal, with an account that can access Open Ethnographer.
-
Visit the Discourse topic for which you want to see or add ethnographic coding.
-
Click on the button âCode with Open Ethnographerâ below the topicâs title. You will see a simple HTML page of the same topic, paginated in case there are many comments. For example, if the original topic URL was
https://edgeryders.eu/t/title-here/1234
then you could now be on
https://edgeryders.eu/annotator?select-topic=1234
-
Select a project. A âprojectâ is the same as an ethnographic corpus: here a set of Discourse topics, annotations in them and the associated codes. Annotations in the same project can belong to different projects, allowing users to annotate the same content without interfering with each other. You will know which project to select based on your work instructions. (If there is no suitable project, you can create a new one under âSettings â Projects â New Projectâ.)
3.1. Coding of text
-
To create a new annotation (aka âethnographic codingâ):
-
Select some text and click the annotation button that will appear.
-
Type a substring of the codeâs name you want to tag with. Whatever you type, including spaces, will be searched for in all code names. In practice, it works best to type starting with the first character of the codeâs name but omitting all its ancestor codes, and to not omit any characters in the sequence.
Multi-substring search in code names will soon be available (status). Once it is, just type the first few letters of any word in the code path, separated by spaces, to find your desired code fast.
-
Choose the desired code name. Either select it from the proposed completions, or type any code name that does not result in a proposed completions. In the latter case, that code will be created on-the-fly when you save this annotation.
As a special feature, it is also possible to create codes on-the-fly and sort them into the hierarchy in one step. Usually you would create codes on the first level only and later sort them into a hierarchy using the management functions in the âCodesâ section. But if you want, you can also directly create them in the hierarchy by typing a code name that has
" â "
(space, arrow, space) between hierarchy levels, just the way that such codes are displayed in the auto-completion proposals. -
Repeat the last two steps if you want to add more codes to the same selection of text.
-
Click âSaveâ.
-
-
To edit an existing annotation, hover over text with yellow background, then click the pen icon in the popup that will appear.
-
To delete an existing annotation, hover over text with yellow background, then click the cross icon in the popup that will appear.
3.2. Coding of images
-
To create a new image annotation, pull up a rectangle around the image section you want to annotate, by clicking and dragging with the mouse pointer. Then select a code just as when coding text.
You can create as many annotations on one image as you want, and their rectangles can intersect and contain one another.
-
To edit an existing image annotation, hover over the rectangle that represents the annotation to edit, then click the pen icon in the popover that will appear. Remove the existing code and select a new one. It is not possible to edit the size and position of the associated image selection; in that case, youâd have to delete and re-create the annotation with a different image section.
-
To delete an existing image annotation, hover over the rectangle that represents the annotation to delete, then click the cross icon in the popover that will appear.
3.3. Coding of videos
How to code a video. In the coding interface, you will see a link âSee annotations / add annotationsâ below each video that can be coded. Click on it to go to the video coding interface, then click the top-left play button in the video to start it. Then:
-
To create a new video annotation:
-
Wait for the position in the video where you want to start your annotation.
-
Click the âNew Annotationâ button in the playerâs toolbar. The player will pause and a popover window will appear.
-
Enter a tag just as when coding text.
-
Move the yellow triangle markers to adjust start and end position of your annotation. Note that annotations in edit mode are shown in yellow, and only then the markers can be moved.
-
Click âSaveâ.
-
-
To view existing annotations:
-
Click on the âShow Annotationsâ button. Bars will appear as overlay on the video, representing annotations with their start and end times. They appear as a stack in the order they were created, with the newest at the top and one annotation per row. Timestamps in the right corners indicate the creation times of the top and bottom visible annotations.
-
Hover over the bars to see information about each annotation.
-
Click on a bar to play the part of the video to which the respective annotation belongs.
-
-
To filter existing annotations by video time range in cases where the list is otherwise too long to work with:
-
Click on the âShow Annotationsâ button. Again, bars represent annotations.
-
Move the orange triangles in the top left and right corners to define the time range of the video slider that should be used as a filter to show annotations.
-
Now, only the annotations that at least partially overlap with the selected time range are shown.
-
-
To edit an existing video annotation:
-
Click on the âShow Annotationsâ button. Bars will appear as overlay on the video, representing annotations with their start and end times.
-
Hover over the bars to show the popover window with information about the annotation.
(You can also click on one such annotation to play its associated video snippet, and then hover over the bar in its new position near the video player timeline. The same popover with information will show.)
-
Move the cursor into this popover window and click the pen icon that will appear.
-
You can now remove the annotationâs code and add a new one, and also adjust the start and end position by moving the yellow triangles.
-
Click âSaveâ.
-
-
To delete an existing video annotation, proceed as for editing an annotation but click on the cross icon instead of the pen icon.
Types of videos. Videos can be added to Discourse in several ways, and the process to code them is different due to legal and technical reasons:
-
Videos uploaded to Discourse. When a user uploads a video as
.mp4
file while creating a Discourse post, it will be shown in the browserâs default player embedded into the post. Such videos can be coded directly. Currently, the upload file size limit is 100 MiB to keep our backups manageable â so, choose your MP4 video quality and encoding well! -
Videos uploaded elsewhere and referenced as a file. Instead of uploading a video directly to edgeryders.eu, you can also upload it to a different platform like Imgur as long as that platform provides a direct link to the uploaded
.mp4
file. When placing such a link on its own line in a Discourse post, it will result in an embedded video as well, and such a video can also be coded directly. -
Videos uploaded to video platforms. You cannot code videos embedded from YouTube, Vimeo etc⌠This is due to legal reasons: YouTubeâs terms and conditions for example allow showing the videos from their platform only with their player, but we need a special player to create the annotations. So before a coding project can begin, an administrator would have to edit these posts and transform such videos to one of the other two types listed above. Instead of replacing the original embedded video, the administrator can also add the codeable version below, hidden inside a foldable
[details="Summary"] ⌠[/details]
element. If the video was created for the userâs post, this re-uploading will be covered by the Creative Commons licence that users grant for content they post on edgeryders.eu.
3.4. Coding of audio
Currently, coding of audio is covered by coding a video made from the audio and a still image.
At the start of a coding project, a platform admin might have to bring the audio into this format. This applies to cases where users added the audio by embedding a SoundCloud track or similar.
4. Managing codes, annotations and settings
The Open Ethnographer interface allows to administer codes, existing annotations and your user settings. Hereâs how:
-
Log in to edgeryders.eu as normal, with an account that has access to Open Ethnographer.
-
Click on âOpen Ethnographerâ in the top menu bar to visit the Open Ethnographer interface. You can also bookmark the direct link for later use:
edgeryders.eu/annotator
.
How to use the different sections of the interface:
-
Current project. A dropdown to select the coding project you are working on. No other functionality in Open Ethnographer, except âSettingsâ, will be accessible until you select a project. If you want to create a project first, you can do so under âSettings â Projects â New Projectâ.
-
Codes. Here, you can create, show, edit and delete ethnographic codes and their translations to various languages.
Codes form one global hierarchy per project, for all authors combined. After each code in the list, the number of annotations using it in this project is shown in parentheses. To delete a code, you have to manually delete its sub-codes first, or move them to other parent codes. (This is a measure against accidentally deleting too much.)
On the screen to edit a code, you can add translations of the codeâs name into other languages defined in section âLanguagesâ. In the top-right under âViewâ, there are three other ways to view the codes list: a view with a bulk translation form, and two views that allow to copy&paste the content with minimal formatting, in order to create a codebook document in an external software.
-
Topics. A list of all topics with annotation statistics about your and total project annotations in this topic. You can select to see topics with at least one annotation in this project (when using filter âWith Annotationsâ), or simply a list of all Discourse topics (when using filter âAny Number of Annotationsâ). The list can be filtered by annotation author.
Clicking on a list entry will bring you to the coding view for that topic so you can continue coding there. To start coding in an uncoded topic, search in the list for the topic title, with filter âAny Number of Annotationsâ. Or simply use the âCode with Open Ethnographerâ button on the Discourse topic page.
-
Annotations. Shows the existing annotations of this project, created by any ethnographer. You can see their data, filter by creator, and change the creator (but this will only be needed during imports and other administrative changes). All other changes to annotations are done in the coding view.
-
Settings.
-
General. Allows to configure a setting âPublic Codes List Api Endpointâ, which controls if the names of codes is available publicly via API or not (see â5. Using the APIâ). If set to
false
, this API endpoint is access-protected like the other Open Ethnographer API endpoints. -
User Preferences. Allows to configure some aspects of the Open Ethnographer behavior per user. Currently the only option is âLanguageâ, allowing to set and change a userâs standard coding language. After changing this, all codes you create afterwards will by default assume that the code name is in the language you chose here. Existing codes are not affected. (You can change the language of any existing code in section âCodesâ.) This setting also chooses which target language column to use in the Translate View of codes. If you canât find your preferred coding language in the list, create it first in section âLanguagesâ.
-
Discourse tag. Set or change the coding project (âethnographic corpusâ) youâre currently working on, as represented by its
ethno-*
Discourse tag. It is important to keep this setting up to date, as it enables collaborative coding. When set to anyethno-*
value, the codename auto-complete list will suggest all codes used in that coding project, independent of code author. When set to the empty value, Open Ethnographer will use the default behavior of showing a user all the userâs own codes, indepenent of coding project. -
Projects. Allows to rename existing projects, and to define new ones.
-
Languages. Allows to define the set of languages that can be used in code names.
-
5. Using the API
We created an access protected custom API extension of the Discourse API that gives access to Open Ethnographer codes, annotations and ethical consent data. For the API documentation, refer to this topic:
6. Process for a coding project
Before coding can start, you should define which topics will belong to your coding project, so that your collaborators can find and code them. There is no direct way to associate a topic to a coding project, but we can use the fact that Open Ethnographer will consider all topics part of a project that have at least one annotation belonging to that project.
So our convention is to create a pseudo-code like selected topic
and to create an annotation on the first word of every topic that should be part of the coding project. Additionally, you may want to use a pseudo-code selected priority topic
in the same way to signal to ethnographers which topics to code first.
Once coding is finished, every topic will be coded, with multiple real annotations in it. The pseudo-codes created above are now no longer needed to associate topics to projects. You can safely delete them together with their annotations.
7. Best-practice conventions for coding
7.1. Big Picture
When we are coding, we need to think about the rigour of the coding system so that others can easily understand and use our codes and the data structure we are producing in the SSNA. This means:
- creating codes that carry meaning, are salient and are essence-capturing when viewed on their own
- defining everything in enough detail and documenting why you chose to use a specific code
- creating consistent and clear categories so that other ethnographers can easily navigate a large codebook
- thinking about how someone who has never read any of the underlying data would read and understand the code if they saw it
- thinking about what meaning the code will carry when it co-occurs with other codes in a visual network
Do code review after every coding session and clean up your codes â make sure they fit the coding conventions both technically (in terms of case, invivo designation, etc) and semantically (that they arenât a synonym of existing codes, that they arenât compound codes, etc). We will save ourselves a lot of headache going forward if we do this while we work instead of trying to go through 1000s of codes retroactively.
Code descriptively. When you code, donât just code for âcontentâ. If a machine could assign the same code as youâre assigning, thatâs a good sign that you need to rethink how youâre coding. As an example of this distinction, ethnographers sometimes make the mistake of using invivo codes (where you use the same terminology as the participant) when the term itself isnât special, rather than thinking about what they mean and finding the right code to describe that meaning, in line with the codes you and your team have used previously if the concept has come up before. See your desire to automatically assign an invivo code as a red flag going forward, and ask yourself if youâre using their exact word because itâs the word that saliently captures the concept (and the one youâve been using thus far to capture that same concept) or because itâs the easiest thing to do in the moment. We code for meaning â what is the point of what this person is trying to say? What are they trying to get across? What values and worldview are they putting forward?
Ask yourself: if the codes I assigned to this post or comment were assembled together by themselves, would the viewer be able to tell the story of what this person is thinking or feeling? This means assigning more descriptive codes (like seeking purpose
or defining justice
or building inclusivity
; access
and education
; complexity
and asking experts
) rather than codes like limiting
or involvement
, which have no real meaning. This also helps you not overassign codes or create overly compound or vague codes.
Once youâve assigned a code for a concept, be consistent and stick with it (or edit/update it across the entire corpus). The reason we call our codebook an âontologyâ is that itâs more than a list of codes â itâs a compendium of participant meaning. If I call in-person human interaction face-to-face
then I need to continue to use that phrase, rather than also assigning codes physical human interaction
, real-life contact
, and so on.
Consistency requires being consistent with the co-occurrences you use to capture when people express the same sentiment â it goes beyond not assigning single codes with the same meaning. For example, imagine participants are expressing that they want to leave something material behind that outlives them, that lasts beyond their lifetime. If I use the connection between the codes built to last
and legacy
, I need to not then also assign codes longevity
and making one's mark
or even longevity
and legacy
to a different story later in the month. Note that any of these combinations would be legitimate ways to capture this sentiment: you just have to pick one way of doing it and consistently apply it going forward.
7.2. Coding Conventions
Spelling and Formatting
Use British spelling for English-language codes.
Use lowercase letters (unless capitalised letter has semantic meaning, as in a proper noun).
Use accented letters as normal in all languages.
Avoid compound codes
Because the SSNA detects co-occurrences, it is important that each code carries one meaning, that can then be linked to others.
For example, code homosexuality
and discrimination
rather than homosexuality and discrimination
or homosexuality:discrimination
.
As @alberto aptly puts it:
Specificity vs Generality
If the code does not carry any real meaning on its own (e.g. approaches
), it is too general.
If the code is too granular to ever be reused (e.g. hyperactive mosquitos
) then it is likely too specific.
Sometimes this requires creative thinking. If youâre trying to capture the idea that informants are expressing that Romania is more awesome than Poland, the code Romania is more awesome than Poland
is likely not going to co-occur in many places (though @noemi might disagree ) . However, we could use the code country comparison
and co-code it with Romania
and Poland
(or Romania-Poland
if we really felt strongly that we didnât want to lose that specific country comparison as tied to the country comparison code, though see why we should hesitate to do this in the compound codes section above. We could always go back and copy the code Romania-Poland
if we found it was too specific, assigning one instance of all the annotations to Romania
and the other to Poland
, so no harm done as long as we keep that more broadly useful country comparison
there.)
How many codes to assign?
It is much easier to merge (and copy) codes than it is to fork them. As a result, aim for greater granularity and merge into a more general concept if you find upon review that the granularity is too small.
If we code everything discrimination
and realise that we wish weâd coded homophobia
and racism
and sexism
differently, we have to go back and re-read and recode all the annotations assigned to discrimination
. If instead we code the other three and decide they are too granular, merging them all into the code discrimination
is easy. If we decide homophobia
is occurring on its own broadly enough but the other two are too granular, we can always merge racism
and sexism
into discrimination
but leave homophobia
alone. If we want all of the instances of homophobia
to also be co-coded with the higher-level code discrimination
, we can easily copy the code homophobia
and merge that second instance into discrimination
so that all its annotations are also coded with the code discrimination
. This is what we use hierarchies for in the backend, to more easily keep track of such concepts, as I will return to in the next section.
Code âthat which goes without sayingâ
Part of our job as interpretive analysts is to use our sociocultural understandings and our training to read between the lines. If community members are talking about two concepts explicitly (say remote working
and e-learning
) if Covid-19
is the context from which this conversation emerges, and the community members are clearly assuming that shared context in their conversations without stating it explicitly, be sure to code it.
Culture is often termed âthat which goes without sayingâ, and part of our job as ethnographers is to explicitly say it. This is especially important in the context of populism, where family values
traditionalism
and housing policy
might be used to speak about something like homophobia
in subtext.
Code only what has meaning
Back to ethnography as an interpretive method. One way not to overproliferate codes is to make sure that a code is only applied if the community member uses the concept meaningfully.
For example, if an event takes place in the United States, but the activities that happened at the event are not meaningfully connected to the fact that the event took place in the United States, in your interpretive assessment, do not apply that code. If, however, the fact that a certain activity happened in Prague (a major city) rather than a rural area, apply the code Prague
. Use SSNA thinking here: a co-occurrence network around cities might differ substantially to one around a rural area (certain ideas might be more widely held and repeated in the capital city than in a rural town, for example), which we would want to capture in the SSNA. Code with intention and interpretively.
As another example, a community member might mention that they are 50 years old. You would only apply the code age
if age was a meaningful frame used by the community member â if the story was about growing older and life changing, for example. But if the fact is incidental and they go on to talk about monster truck racing
, do not apply the code age
.
Three Tiered Invivo System
If a code is descriptive (your term for what informants are describing, or a word that is used in ordinary parlance to refer to the thing you are referring to), use unmarked text. Example: sustainability
or mental health
If a code is invivo (a directly quoted word or phrase that your informants used that is unique, interesting, or salient as a concept, or does not necessarily fit the ânormalâ use of that word) use double quotation marks. Example: "witch"
, "the East"
, "punk"
If a code is in-between (a conceptual category used by informants that you are aggregating into a term yourself and/or that does not fit the dictionary or academic use of that term), use single quotes. Example: 'communism'
or 'patriotism'
Reason:
Hierarchies
Hierarchies do not appear in the SSNA itself, but we use them to enhance our ethnographic practice. Here are some ground rules.
Every code in a hierarchy must make sense on its own. Discrimination
could be the parent of homophobia
or sexism
, but creating a code like approaches
and nesting it under discrimination
is a no-go.
Use hierarchies to toggle specificity and generality. Letâs return to an example we used above. If we want all of the instances of homophobia
to also be co-coded with the higher-level code discrimination
, we can easily copy the code homophobia
and merge that second instance into discrimination
so that all its annotations are also coded with the code discrimination
. We might do this if we decide that discrimination as a code by itself would co-occur meaningfully with other codes in the SSNA.
However, we donât want this to happen automatically, because excessively higher-level codes can end up dominating the graph too much, and may not carry enough meaning on their own to be represented (like geographical location
, which is a useful organising category but not a very useful SSNA category). We use hierarchies in the backend for different purposes, and not all are worth representing in the SSNA.
7.3. Codebook Structure
The codebook consists of a code + its definition (âcode descriptionâ in the backend).
You can easily edit codenames and descriptions in this view:
To get here, click on âCodesâ and then in the âViewâ drop-down in the top right corner, select âTranslateâ (the default is âTreeâ). You can filter by project tag to select your project, and you can filter by name to select codes you have assigned.
To add definitions to the codes you assigned most recently, select âNewestâ in the âSort Byâ Dropdown. You can also use this view to review the codes assigned most recently by other ethnographers.
For now, to do code review, we can create a Discourse thread to discuss larger changes and leave comments with our initials in the code description field itself. You can also keep any memos about the code in the code description field.
To denote that a code needs review and to call attention to other ethnographers, add a * sign at the front of the code so it appears first in the code list. When you see that asterisk, resolve the comment that the other ethnographer has left and remove the comment as well as the asterisk.
Codebook Style Guide:
- British English spelling and grammar conventions.
- Code: lowercase unless a proper noun. This is very important as codes are case sensitive!
- Definitions: initial letter is lower case unless it is a proper noun or name. Definitions end with a full stop.
- Mark in vivo terms using double quotes, mark conceptual categories (see âthree-tiered systemâ above) in single quotes.
Document Everything
Define your codes immediately.
Define every code. I mean everything. Even if it seems self-evident. See above on saying âwhat goes without sayingâ. This applies to our own frames as well â what seems self-evident to one of us will not be self-evident to another one of us.
Assign both English and source language translations in the backend, so the codes are connected.
If youâre not sure about a term or code, note it down and explain why so that others can help you hone it / you can return and refine it. Add an asterisk so itâs clear it needs work.
Create categories in your own codebook as often as possible to help you structure and streamline your codes. I recommend creating these as you code, or at least frequently, to avoid having to do this in a big batch. Doing so makes you less likely to assign different codes to the same concept and have to merge later, since you can see your existing codes more clearly.
Interacting with Other Ethnographersâ Codes
Review other ethnographersâ codes frequently. Ideally every time you start coding, do a quick pass of whatâs been added since you last checked (using the Sort by: Newest function) .
Check if:
- the code name accurately expresses the definition (is everything in the definition captured by the term used? Is there a more salient or accurate term for what the ethnographer is trying to express in the definition?)
- the code is too general to carry meaning on its own
- the code should be broken up into two separate codes
- the definition/concept is already expressed by another code used in the codebook
- the ethnographer has asked any questions that you can answer
- the hierarchies and categories they use make sense
Merging codes. If you think your code means the same thing as someone elseâs (or close enough that you should seek to align them), make a note of it in the related codes tab. Once you discuss the merge with the other ethnographer, merge the codes. Remember to check with the other ethnographer if you want to change the code or its definition.
7.4. Thinking in network when coding: mapping coding to network structure
Basic concepts
When you code with Open Ethnographer, you are implicitly arranging codes in a graph. The basic graph structure is something like this:
Entities in the graph are:
- authors (participants)
- their posts
- ethnographic annotations
- ethnographic codes.
The types of relationships involved are:
- Authors write posts.
- Posts may reply to other posts.
- Annotations annotate posts.
- Annotations invoke codes.
These relationships are fundamental: we cannot deduce them from the data, only create them when authors write posts, or ethnographers code them. But there are other types of relationships that we can deduce from the fundamental ones, through a technique called projection. The most important ones are:
- A social relationship between authors: author Alice is talking to (or engaging with) author Bob when Alice writes a post that is a reply to a post written by Bob.
- A semantic relationship between codes: code C1 co-occurs with C2 when there exist two annotations A1 and A2 where A1 invokes C1, A2 invokes C2, and A1 and A2 annotate the same post.
In the case above, these relationships do not appear. There is only one author, Alice, and only one code, C1. But imagine now the ethnographers creates a second annotation on the same post, and invokes a new code C2. Like this:
Now, the codes co-occurrence network shows that C1 and C2 co-occur once, because they are both invoked by annotations to post P1. The number of co-occurrences is represented by the weight of the co-occurrences edge. The interaction network shows only Alice, interacting with no one.
Suppose now that Aliceâs post was in fact a reply to a post written by Bob. The situation is now this:
With no more annotations, the codes co-occurrence network is unchanged. But the interaction network now shows a link from Alice to Bob, symbolizing engagement. This edge, too, is weighted: the more replies Alice writes to Bob, the heavier the edge.
Every addition to the conversation database (authors writing posts, ethnographers adding annotations and codes) is encoded this way. So, you can think of coding as drawing networks. With every annotation, researchers using Open Ethnographer are adding nodes and edges to the conversationâs semantic social network, more specifically to its semantic part, the codes co-occurrence network.
Multiple annotations on a single post induce a clique
When a researcher adds an annotation to a post in OpenEthnographer, the code invoked by it will by construction co-occur with all the codes invoked by the other annotations on the same post. So, any post whose annotations invoke two or more codes gives rise to a clique of codes â a completely connected network, or part thereof. The number of edges in a clique depends on the number of nodes. In an undirected network like the codes co-occurrence network:
- With 2 annotations, you get 1 co-occurrence edge.
- With 3 annotations, you get 3 co-occurrence edges.
- With 4 annotations, you get 3 co-occurrence edges.
- With
n
annotations you get(n * (n - 1)) / 2
edges.
If you visualize the full co-occurrences network (including edges of weight 1), rich posts are easy to spot as very dense cliques, often connected to the rest of the graph by only one or few codes:
Interpreting repeated co-occurrences
When doing SSNA, i.e. the analysis of semantic social networks, you would not attribute a great deal of importance co-occurrence edges of weight 1. There are two reasons for this, one conceptual and one network-structural.
The conceptual reason is this. SSNA is a quest for collective intelligence. It aims to capture how a group in conversation , not a single individual or a collection thereof, see something. By construction, an edge of weight 1 in the codes co-occurrence graph means that the two codes in question occurred together in the same post only once; posts can only have one author, so only one individual has made that association explicitly, only once. This does not qualify as collective intelligence. When the same co-occurrence repeats itself across multiple posts, it is likely to encode an association supported by the collective. We treat repeated co-occurrence as a signature of collective intelligence .
The network-structural reason is that a rich post might well have 20 annotations with 20 different codes. This means 190 edges. The number of edges in the graph can easily become dominated by a few rich posts. There is no elegant solution for this.
Some network scientists dealing with interconnected cliques like to assume that all cliques (not all edges) have the same weight, equal to 1. They, then, rescale the weight of the edges by the inverse of the number of edges therein. In our case, this would mean assuming each post has one âvoteâ to spend. If a post is annotated invoking four codes, each of its 4 * 3 / 2 = 6
edges would have weight 1/6. A post with only two codes invoked would give rise to a single edge of weight 1, and so on.
We do not consider this to be a good solution for online ethnography. A large number of annotations on a post tends to mean that that post is indeed very rich in meaning (and often longer than average). It is by no means clear that the connections across these codes would be of less value than those stemming from posts with only 2 or 3 codes.
Instead, we filter out all one-off connections, and consider only the co-occurrences that appear at least twice in the corpus. This:
- Anchors more firmly our claim that the codes co-occurrence network has something to do with collective intelligence.
- Gets rid of the cliques.
- Simplifies dramatically the graph: in our studies so far one-off co-occurrences make up about 90% of all co-occurrences.
The discussion on this Ethnographic Coding Wiki is treated in its own topic:
8. Archiving ethnographic data
In Edgeryders, we believe that scientific knowledge should be free and accountable. This includes research data. Data should be FAIR (Findable, Accessible, Interoperable and Reusable). For properly archiving the results of online ethnographic research, Open Ethnographer data have to be transformed, packed together with relevant metadata and uploaded somewhere where they can be found, accessed and understood, both by the researchers themselves and by third parties, like other researchers.
OpenEthnographer includes some facilities that help you make them so. There are two main thing that you can do to make your corpus coded with OpenEthnographer FAIR: export and archive the corpus as a dataset and export and archive the codebook of your ethnography.
8.1. Machine readable: data export and long-term archival
Exporting the corpus as a machine-readable dataset is the most complete and accountable way to make your ethnographic research data FAIR. With it, other researchers can in theory reproduce your results. It consists of the following steps:
-
Export proper. This requires API access, and happens by running a Python script. You feed it the name of your project (currently identified as the set of topics tagged with the
ethno-PROJECTNAME
Discourse tag), and it exports four CSV files calledannotations.csv
,codes.csv
,participants.csv
andposts.csv
. The usernames used by participants on the Edgeryders platform are pseudonimyzed as@anon12345
to protect their right to withdraw their consent to participating in research. This set of files allows researchers to rebuild the full semantic social network, as the rows in each file contain IDs for database entities, and pointer to the IDs of rows in other files. For example, each row in theannotations.csv
file contains a column with the ID of the code used in that annotation: you can then retrieve that code by its ID in thecodes.csv
file. You can download the script from its GitHub repository. -
Add metadata. We strongly recommend adding the metadata in a standard format. We like to use the Data Package standard. This requires adding, to the four data files described above and in the same directory as them, a file called
datapackage.json
that contains a description of each file and each column in each file. Our repository contains a templatedatapackage.json
which already describes all the columns of each file: you will still need to describe your own project. -
Archive for permanence and findability. Your dataset, consisting now of the four CSV files plus the
datapackage.json
, should now be archived in a place where it has a high probability of remaining findable and reusable for a long time. We recommend using Zenodo, CERNâs repository, used for CERNâs own data but also open for use by anyone. If your project is funded by the European Union, Zenoso has an added bonus: it gives you a field for the EU grant number. This means that your dataset will now be automatically indexed by OpenAIRE, the EUâs portal for research data.
8.2. Human- and machine readable: Codebook export
Codebooks are ordered lists of codes. Each entry contains relevant information about one code used in the corpus, like a description, its use cases, its parent code and so on. They are a standard-issue work instrument for many ethnographers, and are sometimes used as deliverables of projects.
OpenEthnographer can automatically generate an attractive codebook in table form for your project. To do it, start by calling the codes view:
https://edgeryders.eu/annotators/codes
Next, select the Discourse tag ethno-PROJECTNAME
identifying your project from the drop-down menu on the top left. Finally, select Plain table
from the View
drop-down menu top right.
If you want to create an attractive Codebook deliverable, we recommend copy-pasting this page into a spreadsheet (tested with Google Sheets). At this point, you can format the spreadsheet, delete columns that you do not need, change the font, paste it into a document containing a foreword, etc.
If your codes are arranged into a hierarchy, the Path
column contains all the lineage of each code (the codeâs parent, its parentâs parent and so on). It is rendered in such a way that you can sort your spreadsheet on it, and you will get codes arranged by hierarchical level and alphabetically within each level.
9. Contributing to development
Open Ethnographer is open source software and youâre welcome to contribute:
-
Code. The code is hosted as project edgeryders/annotator_store-gem on Github. (The repo name is legacy and will change soon; it comes from an unmaintained project that we forked and used as our base software.)
-
Documentation. This document that youâre reading now is the documentation. It is provided as a wiki and you can edit it after opening an account on edgeryders.eu.
-
Issue tracker. Please contribute issue reports and feature requests in the projectâs issue tracker on Github.
-
Discussion forum. If you have a feature request or idea but are not too sure about it and want to gather othersâ input about it first, youâre welcome to do so in the Software (SSNA) forum category of our Discourse forum.
-
Support forum. If you need help with the installation or usage of Open Ethnographer, please post on the Open Ethnographer Manual topic of our Discourse forum â that is, simply comment below.
9.1. Vocabulary for issue reports and feature requests
Open Ethnographer: the software as a whole used for coding.
Coding View: the view in Open Ethnographer where you assign codes to text by highlighting sentences, accessed by clicking the âCoding Viewâ link at the top of any post.
Codebook: List of codes in the backend. Can be viewed as a list of codes/ tree, which shows the codenames and hierarchies:
or as a translate view which shows the codes and their definitions as editable:
Annotation: A snipped of text that has a code assigned to it.
Thread: The original post on Edgeryders plus all the comments on it (thread title is called the Topic)
Post is the original conversation-starting contribution, comments are the user replies to this post and to other comments. Note that Discourse treats these two the same for the purposes of the SSNA.
Discourse tag: The tag (assigned at the top of any thread) that indicates what category the thread is in. E.g. "ethno-ngi-forward
Code suggestions list: the list of codes that appears while you are coding in the coding view as possible suggestions to assign to text:
Addendum: to make a code look like this
in a post on Edgeryders, use a ` on either side of the word (usually found under the F1 key).
10. Getting ethnographic data onto the Edgeryders platform
There is an ideal process for eliciting contributions from community members (as posts and comments on the platform) so they can be coded by ethnographers and show up accurately in the SSNA. They are listed below in order from most to least ideal.
-
Community member makes account on Edgeryders and posts their own story or comment under their own pseudonym.
-
Someone creates an account for the community member and posts their story or comment on platform, with their permission.
-
Another community member posts the story or comment using their own account but makes clear that these are another community membersâ views.
For audio-recorded events and interviews:
(See the Consent Process Manual for the ethical framework)
-
Each participant of the event creates an account and posts their inputs as comments (difficult, but ideal).
-
ER member transcribes the audio event, creates different accounts to represent each speaker (with permission, and pseudonymised), and posts their contributions as separate contributions.
-
ER member transcribes the audio event and ethnographers code that transcription. If possible, separate into different comments based upon topic (so itâs not just one giant block of text).
-
Ethnographer codes the event audio directly using the audio coding function described in Sections 3.3 and 3.4. This is not ideal unless the event is audiovisual and the visual elements are themselves significant (e.g. there is action going on besides just speakers talking, and that action is meaningful ethnographically).
For field notes and event notes:
-
As much as possible, have participants themselves write their reflections on the platform by making an account and writing a post.
-
Import ethnographer field notes as different posts and comments attached to pseudonym accounts based on different participantsâ contributions/reflections, to capture them as different people for SSNA.
-
ER member (community manager, ethnographer) posts field and/or event notes to platform in text form so they can be coded.
11. Using the Open Ethnography platform
We use a separate Discourse site openethnography.net to offer other researchers a low-barrier way to use our Open Ethnographer software. This is simply one of our Edgeryders Communities sites, listed in the top-right âCommunitiesâ menu on edgeryders.eu and all its associated other Communities sites.
To set up openethnographer.net so that users do not interfere with each otherâs privacy and coding practice, some of the configuration is done differently than here on edgeryders.eu. In the following, these configuration processes are described.
11.1. Creating a workspace
A workspace is simply a top-level category in which a group of researchers collaborate. It can be split into the category proper, accessible to all researchers in that workspace, and sub-categories for each of the researchers, accessible only to them. The top-level category representing the workspace can be public or protected. If you want to protect it, create a new Discourse group and give only that group (plus mods and admins) access to the new category.
11.2. Adding a new user
This involves granting the user access to Open Ethnographer, adding the user to the workspace, and providing them with a personal workspace inside that workspace. In detail:
Assuming that the groupâs workspace categoryâs associated Discourse groupâs shortname is workspacename
and the given userâs first name is firstname
:
-
Create a new Discourse user group with name
workspacename_firstname
and full name âWorkspace Full Name: Firstnameâ. Doing so, (1) add the workspace manager as the group owner and the new user as a group member, (2) set group visibility and member visibility to âGroup owner, membersâ. -
Edit the âWorkspace Full Nameâ category and add the new group in the Security tab to have access. (This is needed for technical reasons when we want that group to have access to one of its subcats.)
-
Create a category âFirstnameâ as a sub-category of category âWorkspace Full Nameâ.
-
In the security tab while creating that category, remove the âeveryoneâ access line and add instead group âworkspacename_firstnameâ.
This way, the personal workspace categories and their associated groups are only visible to the workspace manager and the person whose personal workspace that is, not to all members of the superordinate workspace. So it looks tidy to everyone except for the workspace manager. As a benefit, the workspace manager can easily follow the work done in all personal workspaces by looking at the âNewâ and âLatestâ tabs in the workspace category.