Split into its own topic from https://edgeryders.eu/t/understanding-api-calls-for-building-ssna/7764/8
@melancon, it turns out it was not an import issue. 696 of the 697 codeless annotations were made by Shweta aka @Digitalanthropology, who has only been working in the past week. This situation was created by her choice to treat poignant words in the text as codes, and by Open Ethnographer itself not forbidding users to do so. I have since decided this is a bug and not a feature.
However, @jason_vallet has taken the initiative of assigning the value of the
note field of the codeless annotation to their
tag_id field. So the graph is still solid, in the sense of reflecting what Shweta did. The problem is, by reusing the words of the informants, her coding drifts towards a folksonomy instead of inducing an ontology.
The trouble with folksonomies
For example, we have six different codes around "workshop":
open village workshop
At a first glance, it looks like these are really six ways to say the same thing, and should be merged.
Look at "technology":
multinational technology and media companies
technology of design
zewail city university of science and technology
freedom enabling technology
technological experimental labs
open knowledge and technologies
There is more diversity here, but again some might be merged. Also, it is interesting to note the use of "and". If you are working with semantic networks, codes should perhaps not contain "and". If a post mentions a connection between widgets and gadgets, the ethnographer might create an annotation with
widgets as the code, and then a second one with
gadgets as the code. The network will then show the codes
gadgets as connected, as they co-occur. On the other hand, open knowledge and technologies seems to say "this post is about the relationship between open knowledge and technologies". This is fair enough, but it risks missing out, because open knowledge might come up elsewhere in association to, say ethics. In this case, we would have two disconnected codes,
open knowledge and technologies and
open knowledge and ethics, instead of three connected ones,
ethics connected to
open knowledge connected to
Maybe this can be taken care of by a second pass.
As a result, the structure of the graph is not as connected as the ones that Amelia was producing for OpenCare. When k = 1 (nodes= 645, edges =1,720):
Filtering for k= 2 (nodes = 63, edges = 109) we even lose the giant component:
What we do now
@daniel goes through the codeless annotations, and copy the
quote field onto the
- Shweta herself, helped by @amelia, may reconsider some of those annotation (second pass).
- Jason or I re-extract the graph, etc.
What we have learned
There are huge and tacit differences between the coding styles of different ethnographers. Despite being trained to use OE by Amelia, Shweta adopted a completely different interpretation of how the software should be used. We were not aware of the issue until we actually looked at the graph. Ethnographers do not appear to feel a strong need to document their field practices, perhaps because in general they work alone, so the only thing they share is the finished paper. @akmunk rightly pointed out that some ethno research projects have a strong need to protect the privacy of informants, so that open data and open notebook science might be an issue.
Different styles of coding induce different network structures. Yet another reason to have clear and explicit guidelines for what constitutes "good coding".