Ethnographic coding practices and SSN structure: what we are learning

alberto · December 2, 2017, 12:04pm

Split into its own topic from Understanding API calls for building SSNA - #8 by alberto

@melancon, it turns out it was not an import issue. 696 of the 697 codeless annotations were made by Shweta aka @Digitalanthropology, who has only been working in the past week. This situation was created by her choice to treat poignant words in the text as codes, and by Open Ethnographer itself not forbidding users to do so. I have since decided this is a bug and not a feature.

However, @jason_vallet has taken the initiative of assigning the value of the note field of the codeless annotation to their tag_id field. So the graph is still solid, in the sense of reflecting what Shweta did. The problem is, by reusing the words of the informants, her coding drifts towards a folksonomy instead of inducing an ontology.

The trouble with folksonomies

For example, we have six different codes around “workshop”:

workshop
workshop.
workshops
open village workshop
mena workshop
edgeryders workshop

At a first glance, it looks like these are really six ways to say the same thing, and should be merged.

Look at “technology”:

multinational technology and media companies
geo-localization technologies
technology of design
awe technology
self-supply technologies
it/digital technologies
technology
zewail city university of science and technology
freedom enabling technology
technological experimental labs
open knowledge and technologies

There is more diversity here, but again some might be merged. Also, it is interesting to note the use of “and”. If you are working with semantic networks, codes should perhaps not contain “and”. If a post mentions a connection between widgets and gadgets, the ethnographer might create an annotation with widgets as the code, and then a second one with gadgets as the code. The network will then show the codes widgets and gadgets as connected, as they co-occur. On the other hand, open knowledge and technologies seems to say “this post is about the relationship between open knowledge and technologies”. This is fair enough, but it risks missing out, because open knowledge might come up elsewhere in association to, say ethics. In this case, we would have two disconnected codes, open knowledge and technologies and open knowledge and ethics, instead of three connected ones, ethics connected to open knowledge connected to technology.

Maybe this can be taken care of by a second pass.

As a result, the structure of the graph is not as connected as the ones that Amelia was producing for OpenCare. When k = 1 (nodes= 645, edges =1,720):

Filtering for k= 2 (nodes = 63, edges = 109) we even lose the giant component:

What we do now

Immediately:

@daniel goes through the codeless annotations, and copy the quote field onto the tag_id field.

Later:

Shweta herself, helped by @amelia, may reconsider some of those annotation (second pass).
Jason or I re-extract the graph, etc.

What we have learned

There are huge and tacit differences between the coding styles of different ethnographers. Despite being trained to use OE by Amelia, Shweta adopted a completely different interpretation of how the software should be used. We were not aware of the issue until we actually looked at the graph. Ethnographers do not appear to feel a strong need to document their field practices, perhaps because in general they work alone, so the only thing they share is the finished paper. @akmunk rightly pointed out that some ethno research projects have a strong need to protect the privacy of informants, so that open data and open notebook science might be an issue.
Different styles of coding induce different network structures. Yet another reason to have clear and explicit guidelines for what constitutes “good coding”.

johncoate · December 7, 2017, 7:12pm

I discussed this some with Amelia in Brussels. She agreed that ethnography, as practiced, is strongly subjective. And that she did not know how to change that. But yes clear and explicit guidelines are in order, although it is still to be determined how explicit in order to get fairly uniform interpretations. I would think preparatory conversations are a must with frequent review. After a period of time and experience working with the same people, I am sure things would even out some, though not completely.