Would ethnographers do open notebook science?

Edgeryders is currently developing a software provisionally called OpenEthnographer, with the support of the Rockefeller Foundation. The idea is that online conversations, rather than be exported (often via copy-paste) as text to some external ethnographic software, is coded on the platform itself. We believe this move to be have potentially disruptive applications for how ethnography is done and what role it plays in societal advancement (here is a more detailed explanation). The very very rough prototype we are using looks like the screenshot below. Color-coded triangles identify coded snippets of text; ordinary users do not see them, but researchers are given a special set of permissions (very easy with Drupal).

However, we have a major architectural choice to make. We want any piece of text on Edgeryders to be usable by as many ethnographers as want to to research on them. Now the question is: should each researcher be able to keep her coding hidden from other researchers? The way it is now, OpenEthnographer is powerful, but you can only do open notebook science with it; other researchers will be able to see your codings as you work. Some people think this is the highest, nobles, most accountable form of doing science; others are very uncomfortable with it. So we need to decide whether to continue developing the existing architecture or to make significant changes to enable each researcher to make her codings invisible to everyone else.

If you are an ethnographer, know one, or otherwise have an opinion to share on this, please let us know: we cannot proceed without making this decision.

1 Like

One possible tweak: pseudonymous coding

An idea that just came to my mind and could be combined with the current “open notebook science by default” software architecture of Open Ethnographer: creating the codings with a pseudonymous author.

This means, your codings are still visible, right in the live material, to other ethnographers and to web visitors, and they are free to reuse and remix (depending on licence of course). But, they are not connected to being authored by a specific person (resp. Drupal user account). Only the coder herself can make that connection, by knowing their coder pseudonym (which is different from their username).

I don’t think that would work

The typical concern with ONS is not privacy. It is: people are going to use my work to publish before I do! My own research is going to be put down by reviewers as derivative, because someone else takes the credit for my work! Pseudonyms are not going to help a lot with these concerns. On the contrary.

Ah ok, the academic imperative …

This might be the main concern with open notebook ethnography. A possible motivator to make researchers share their coding work would be the “open data” approach: raw data is more valuable to everyone when shared and aggregated (… counting the coding under “data” for a moment), the original value of a research would be in the analysis. This would allow researchers to base their analysis on a massive amount of coded material, more than they could code on their own. A further motivator would be an open data licence that enforces re-sharing of derivative work, like GPL or even AGPL for software.

Maybe the ethnographic researchers coming across the above post might also care to comment on the perceived attractiveness and feasibility of such an open data approach in ethnography. Because that’s also relevant for the Open Ethnographer software design (reuse of others’ coding work could, for example, be supported by allowing to assemble ontologies of “virtual” codes that aggregate some codes of others each).

visible to web visitors

“your codings are still visible…to web visitors”

I have two conflicting opinions about this. I could imagine ethnographers being inhibited to code accurately, if their codings are immediately visible to the participants. Although: on ethical grounds, it seems clear that you should be able to see what the ethnographers are saying about you.

1 Like

It’s also about giving back.

Thanks for identifying these difficult points in public coding! Having research subjects and researchers on the same platform, even interacting on the platform, some even being both subject in one context and researcher in another creates indeed an “interesting” new social context. Away from being the disempowered object of a study (that’s what I like of the effects).

One more thing I like specifically about public coding is: it’s semantic markup added to a user’s texts “for free”, something that’s not economically possible anywhere else on the web. This markup could then serve a double purpose to help users navigating their platform and interacting with each other. Think tools like basic semantic search. Maybe these benefits can help people over the 'bit awkward situation of being a research subject …

how many ethnographers do we expect?

is this issue likely to occur? It seems unlikely to me that we’ll have  multiple ethnographers in such fierce competition that social norms won’t hold them back from stealing one another’s work.


I wouldn’t expect to many, but you never know, and I’m guessing Open Ethnographer will be here to stay, so designing it for the future as a stable tool seems appropriate. I wouldn’t worry much about stealing each other’s work - in the end coding doesn’t make for a good research; analysis, interpretation and providing context does. If I would be an ethnographer trained in academia, where walls are everywhere and openness less so, I’d be afraid of being vulnerable to criticism rather than thinking my work is too precious… a work gone bad discrediting your future… In a way, more openness means more  ability to stand for your results and more accountability and risk of prejudice at the same time. anyway that’s just me.

Answer from Cluj university

Hi, so I asked around and here’s what an ethnographer told me: “ethnography is quite difficult to replicate, I’d say it’s not at all replicable, and I don’t see why anyone would replicate another person’s coding, which is a matter of a very specific topic that a researcher wants to explore. Unsecretizing coding seems find for me, but the respondents’ info should definitely be anonymized”

1 Like

Can also be added later, but deserves a push still

I think I found a software design where the feature of having personal / non-shared codings of live content can be added lateron, as an add-on, not requiring to break down the part of the software built until then. So the project will proceed now even though we do not yet have too many opinions on this matter.

That said, that might also be due to us having forgotten to make the Open Ethnographer group public. So this post was only visible to group members. I have made the group and its posts public now (quite an issue!), so we might give this post a new social media push now that it’s public. @Noemi, @Alberto (… and thanks, you two, for raking in the feedback we have already!)

Some feedback

I got feedback from a few people, which can be summarized “privacy before open data”, aka they would prefer to have their coding anonymous. some others opted for 'at least have an option of making it not visible to others".

(got this from various FB groups designated for social researchers interested in the Caucasus)

1 Like

Noted, and thanks.

Thanks for the feedback! So in total, it seems that researchers at least want the option to keep codings private, while not being opposed to selectively make them public as well.

After much thinking, I think I found a good solution that respects this requirement. The coding interface would not be in the editor, but rather work with Annotator (which see for a live demo, by selecting some text there … it’s quite nice). The storage would work using stand-off annotation technique: giving IDs to all words, and storing codings externally from the text by referencing the coded words’ IDs. (More in the software design wiki.) Still work in progress. For now, I’d like to know from you if word-level coding is sufficient? That is, do ethnographers require the option to code parts of words, or not?

Options are good

If we have a good solution for the possibility to different sharing options for the codes, I would definitely go ahead and implement it. It makes our work much less controversial, and takeup easier.

Yes, we’ll do it with options

With the new idea from above, making sharing optional is possible now, because the idea allows both features that I previously assumed to be mutually exclusive: coding the live version (allowing code sharing and live version edits), and keeping codings private where required.

This idea works by combining stand-off annotation with random, non-continuous word IDs: usually, stand-off annotation just means storing codings externally as (start word number, stop word number, code ID); in Open Ethnographer, we will store them as a set of (word ID, code ID) tuples instead. So if words are inserted or moved in the live version, it does not mix up the numbering. This seems to be new, I did not find anyone else having tried this before :slight_smile:

I don’t think there is a big need amongst ethnographers to be able to code parts of words. So word-level is very sufficient.