This is what Open Ethnographer will be like

matthias · November 19, 2014, 1:34am

So I just finished the software design for Open Ethnographer. It’s the basic architecture, still allowing features to be added and removed while the requirements become clearer.

But you can already get an idea of what Open Ethnographer will be like. It will be based on eComma, which is great find: an open source “commentary machine” module for Drupal, just recently released. This is how eComma looks right now:

And here’s the rest of the sneak preview – no screencasts, sorry, you have to invest some imagination:

Look through the eComma mini-manual (section "User Manual").
Then try the eComma live demo yourself. Tagging does not work there as you have no login, but that does not matter as we'll do it a bit differently anyway:
Try out the Annotator live demo to see how tagging will work. Just select some text and click the button that appears. Tag selection and some other stuff will be differently though, so:
Finally, read through the implementation tasks to get an idea of how we will extend the eComma software to be also a code manager and coder for open ethnography. By making it both, we will share the maintenance effort with the eComma developers. Beauty of open source!

Did you get the idea? Do you have feedback or criticism for the basic architecture and feature set? (It’s really just about the basics for now, details are better handled on the go.)

alberto · November 19, 2014, 1:24pm

Annotator is clear, eComma less so

Great work as usual! Let’s go through it.

Annotator looks like a great discovery, not much to say. Where does it store its data? And more importantly: are you planning to use it, or is it just a way to show people how the user interface will be implemented, as you plan to use eComma’s annotating functionality instead? Or is it the case that Annotator is eComma’s annotating technology?

eComma is less clear to me. The main idea seems to have on the same web page the text and the ethnographic metadata: so far so good. But what good is a word cloud without some kind of filter or normalization (show words whose frequency is high relative to the English language average)? What does the comment cloud exactly describe? What are you thinking of preserving, of the eComma functionalities?

@Inga_Popovaite, what do you think?

matthias · November 19, 2014, 3:00pm

Some more details

Where does [Annotator] store its data? – It’s just a JavaScript UI widget, it does not store its own data. eComma has the storage code, and we’ll adapt that to cope with edits in the live version etc…

[A]re you planning to use it, or […] you plan to use eComma’s annotating functionality instead? – I plan to use it within eComma. Currently, eComma comes with its own custom 20 kB of JavaScript that does basically the same as Annotator, so it can be exchanged for Annotator quite easily. Annotator is much more flexible in architecture (has 20 plugins etc.), which is good for our purposes of adding custom tag selection mechanisms.

eComma is less clear to me. […] What are you thinking of preserving, of the eComma functionalities? – Since ethnographers don’t do comments, this part will be disabled. Same for the word clouds. So we keep the “Tag view” and “List of User Annotations” tabs. The “Tag view” tab could be extended to also allow reorganizing the tag hierarchy and adding tags on-the-fly, as required by Inga. There could be another tab added for the quotation manager (“search by tag”), but that probably better goes into a global “advanced search” page. The main benefits of eComma are: providing much of the basic architecture we’d have to create anyway (taxonomy interfacing, theming, storage) and providing organizational architecture to reduce our share of later maintenance and issue fixing costs (they are an institutionally funded open source project from the higher education area).

inga_popovaite · November 21, 2014, 10:37am

A question

Both eComma and Annotator seem user friendly and intuitive tools for tagging. As I understand, there will be an option to create hierarchies as you go, right? Will it be possible to have the same code/tag assigned under several broader categories?

matthias · November 21, 2014, 5:19pm

Depends, but we can try

The data model that I originally planned for would allow this (though the user interface would have to be more complex than drag&drop in a hiearchical list then … we’d have to see). Now that I found eComma, I also want to look at their data model first (which does not allow multiple parent codes) and see if a big rework makes sense. Tell me that the feature is really important for ethnographers and I’ll include it if possible …

In any case, the elements in the hierarchy used for coding (and thus, sharing the codes) would only be the leaf nodes, not any parent node or higher. In the exports, the own parent tags will appear as well, so they can be used in analysis (like for grouping several tags of other researchers together which are too detailed for ones own purposes). They cannot be used for coding though, as then reordering the hierarchy would affect what is coded how, which would also affect other researchers’ codings if they use shared codes. Which would be confusing and not expected. Does that make sense, or do ethnographers have any strong reason to also assign higher-level codes to portions of text?

alberto · November 21, 2014, 10:10pm

Are we sure we need hierarchies?

As Shirky famously argued, ontologies are overrated. We could, instead, build a system that goes from the particular to the general:

"A snippet of text" refers to a tag.
A tag refers to another tag.

That’s it. There’s no other level. You can build levels concatenating tags. Let’s explore some use cases.

Use case 1: generalizing/making hierarchies

The researcher is tagging snippets that refer to widgets and gadgets. At some point, she realizes that widgets and gadgets have someting in common, they are both expressions of technology. She creates a tag technology, and refers to it the widgets and the gadgets tags. Now we have a tag that both widgets and gadgets refer to – in this sense, a meta-tag. But in fact, it is is simply a tag: you could just as well assign snippets directly to technology, if they refer to expressions of technology other than widgets or gadgets. There are no meta-tags, just tags and relationships.

Use case 2: branching out

The researcher finds out that there are several snippets about technology that all refer to the same expression of tech: robots. She creates a robots tag, and refers it back to technology. This way, the coding file knows that all snippets coded widgets, gadgets or robots all inherit technology.

Use case 3: collapsing

The researcher decides that, after all, there is really no difference between how respondents treat widgets and gadgets. Distinguishing the two does not really add any explanatory power to the study. So, she merges widgets, gadgets and technology, and deletes the first two tags. At this point, our tags branch consists of technology and a subset of it, robots.

Does this make sense?

matthias · November 23, 2014, 2:45am

Seems fine! (with a few caveats)

Thanks for the detailed thoughts! I guess (without reading the linked piece, I admit …) that Shirky is right about ontologies for all datasets that are big enough to be “unbrowsable”: if you cannot browse through the whole dataset in your lifetime, chances are you can’t develop a complete, good quality ontology for it either. Then it’s better to rely on heuristics for both. However, ethnography is usually exactly about a dataset not exceeding “browsable” size, as the researcher first has to reads and tags it. You might argue that massive open online ethnography has the potential to transcend that: instead of one researcher doing their own research on one, self-tagged dataset, some researchers can collaboratively tage a dataset and do research on it even though it’s not browsable by any single one anymore …

Ok, so let’s welcome this disruption potential of open online ethnography and see if we can do without the hierarchies. Their usual function for ethnographers is not to represent an ontology but just to be a pragmatic means for fast navigation in potentially hundreds of tags. The “tagging of tags” feature you propose can take over the same function, as the chains of tags can still be represented in hierarchical fashion (probably sorted by chain-length then to push all the “small” untagged tags to the back). It would even fit in better into the eComma datastructure and user interface (which does not use hierarchical tags at all), it would have a better chance to make it into their official releases.

I’m still somewhat concerned about the other applications of this approach:

Changing the tag structure changes the shared taggings. Say, a researcher subscribes to a tag technology that is used by its author to tag a lot of other tags. Later, the author removes these taggings, and technology is only used to tag content. Due to the transitive nature of "tagging a tag", this dramatically affects what is tagged with technology, potentially making that tag useless. Should the subscribing researcher be affected like that just because the original author chose to reorder the tag network? (In my original proposal, I tried to avoid this with a separation between physical tags for content and logical tags for grouping them together into hierarchies, and only the former ones would be sharable.)
Or maybe deprecating tags? Changes in the tag structure as discussed in the last point would not be disorienting for a subscribing researcher if they stay within the original definition of the tag (here, technology). Means, the definition could not be changed after creating a tag, but a tag could be marked as deprecated and (once nobody subscribes to it any more) deleted. The researcher would just start a new tag if the definition of the former one turns out to be flawed in some way.
The category problem. I have a gut feeling that by enabling to tag both content and other tags with one tag we're running into problems lateron. I can't exactly say what problems (will sleep over it) but it seems to me what they call a "category error" in philosophy. Namely, we're treating tags and meta-tags alike. Similar to when treating sets and sets of sets alike, one gets Russel's Paradox. The solution would simply be to let each tag have a type: either tagging content, or other tags. In order to express your case of tagging with technology directly, one would have to create an other technology tag first, tag it with technology, and then tag content with other technology. Which also simplifies the implementation of tag splitting.

With these considerations in place, I think we could try with non-ontological / non-hierarchical tagging. Which would also implement @Inga_Popovaite's idea of having one tag / code under multiple broader concepts.

alberto · November 23, 2014, 11:26am

Huh

Distinguishing between tags and meta-tags means bringing back ontology. Not recommended, also because somebody might want to do a three- or four-layer hierarchy, and then you would have to decide whether technology is a tag, a meta-tag, a meta-meta-tag or a meta-meta-meta-tag.

I think the problems you pose disappear once you allow each researcher to save their own work. Researcher A might consider technology a tag, while researcher B considers it a meta-tag. Researcher C does not use it at all, and tags some of the content that A has coded with technology with, say, machines. When researcher D comes along, she can:

start a fresh coding
fork A's, B's or C's coding and then make any changes on the forked codings file.

This solves any problem of “breaking” dependencies. We could even dream up a GitHub-like system of pull requests later on!

By the way: do yourselves a favour and read Shirky’s essay.

matthias · November 24, 2014, 3:59pm

Forking is a good point; the rest can be decided lateron

Previously I had intended collaboration to happen using the tag hierarchies: one would be able to incorporate a code from another researcher and add to it by creating another code, using it where to code what is missing in the code that one takes over, and create a logical code that combines them both (seen in exports for analysis only). The problems with this solution are that (1) the semantics of adding to someone’s code are not expressed as a concept in the software, (2) one cannot “uncode” from the coded work of a researcher, (3) we might end up not using code hierarchies.

Forking seems to solve these. If we want this, it has to be prepared now in the way coding data is stored, but can be implemented in future versions if it can’t happen now. So I will tentatively include it into the data model (the change simply means storing “code ID, user ID, word ID” triples rather than “code ID, word ID”).

This will not be “git style” collaboration with changesets and commits, but simpler (which is good). And I think forking should not work on the basis of coding files (all codings of one researcher, together) but on the basis of individual codes. This makes data storage, diff-ing etc. quite elegant to implement, and serves the disruptive potential of Open Ethnographer better (as I see it at least): being able to create public semantic metadata in a collaborative and aggregative way, which creates much more of this for later evaluation and research than we could create with current “one researcher, one coding file” methods.

So at every moment, a researcher will be able to compare the coding status of one of their codes with all the forked versions that are around. The software will indicate differences (both added and removed codings to words) with each forked version, excluding only changes that one manually rejected earlier. These differences can then be reviewed and taken over selectively. For the double use of rendering codings as linked data (RDFa markup) into the public website output, Drupal could then use the union set of all publicly shared tags in a “fork set”.

As for the whole code hierarchy discussion, I think we have to leave it for later when it becomes possible to work with the new Open Ethnographer and to see what’s needed. (Because essentially, hierarchies on the coding side are just meant to find a code faster, so after implementing auto-suggest for codes it could already be fast enough.) Keeping the decision for later does not hurt, as currently tags in eComma do use neither hierarchies nor tags, so both would be additions anyway.

matthias · January 8, 2015, 5:46pm

Some new input for the hierarchy discussion

Now that I “worked” a bit with the upcoming Open Ethnographer tagging interface, it seems indeed that code hierarchies are not needed to find a tag faster. Its fast auto-complete and some upcoming fine-tuning of it seem to be a nicely-working solution. You basically find a tag by what you remember about a tag’s name. Which means that Ethnographers can still use an informal system of tag name prefixes and postfixes as a mnemotechnique (like, letting tags for specific initiatives begin with "project: "). The advantage over a formal hierarchy is that restructuring ones tags does not mean one has to re-learn their names / way to find them.

This leaves analysis as the realm where tag structures might still be needed. “Tagging a tag” as you proposed earlier is possible with a term reference field in the Drupal taxonomy that stores our tags, is exportable to RQDA (which supports “tag groups” but not hierarchies) and makes sense as “assigning a broader concept” to a tag just as a tag “assigns a concept” to a piece of text, as per the skos:Concept linked data interpretation.

Update: The disadvantage of a prefix / postfix mnemotechnique as proposed above is of course the manual maintenance effort when changes to prefixes / postfixes are required, due to the implied redundancy. If that turns out later to be annoying and decreasing the speed of work to a significant amount, the “tags of tags” can also be included in a way that lets them be found via the auto-complete function. For example:

tag name "DIY fuels"
tagged itself with "topic", "grassroots innovation"
appears rendered as "DIY fuels (#topic, \#grassroots_innovation)" in the auto-complete list
can be found in the tagger auto-complete field with, for example a "fuel \#topic" search string

alberto · January 8, 2015, 8:45pm

This all makes sense to me

I like this. I am no ethnographer, but I am a client of ethnographers. I do find aggregations interesting and useful (“Look, most tags are names of places!”), but not all aggregations are hierarchical. With this system, individual researchers could still implement hierarchies by simply deciding to assign one and only one metatag to each tag, one and only one meta-metatag to each metatag and so on. But others could decide to have multiple metatags to each tag, and so create non-hierarchical knowledge structures.