Implement storing annotations relative to content, not content display

danohu · December 7, 2014, 4:28pm

I’ve got annotator/annotation set up, and I like it.w

I think we need to modify it, though, to make sure that annotations are stored relative to the content we are annotating, not the details of how it is presented on the ER website.

Currently, you annotate anything on the page, regardless of what node/field it is. The data stored looks like:

“ranges”: [


{
“end”: “/div[1]/div[1]/div[1]/div[1]/p[1]”,
“endOffset”: 48,
“start”: “/div[1]/div[1]/div[1]/div[1]/p[1]”,
“startOffset”: 37
}
],

“uri”: “http://localhost:81/node/2”,

Note that the full URL is stored (not e.g. the node ID), and that the location of the selected text is stored as a selector against the HTML of the full page (/div[1]/div[1]/div[1]/div[1]/p[1]).

Consequently:

it is tied to specific URLS -- annotations made on an index page won't show up on the node detail page
all annotations could be broken if we changed the HTML structure of the site
it'll be painful to export annotations.

danohu · December 7, 2014, 4:40pm

Actually, this isn’t so bad. You can configure which element annotator is attached to in admin/config/content/annotator

matthias · December 8, 2014, 2:15am

Yes but …

Glad you like Annotator. Basically this, our own tagging plugin for Annotator, annotator_view and search integration, and we have the basic Open Ethnographer thing together.

And yes right, one can configure which element Annotator should start from. But it only affects where an annotator button will be shown when selecting text. The annotation will still get stored with the full XPath starting from the document root. For example, in my case Annotator is configured to only allow annotations for the .node element and below, but a path stored with that setting is /div[1]/div[3]/div[1]/div[1]/p[1]. Changing anything about the node template would break it. Or anything about the URL, since it does not (yet) use the Drupal canonical URLs (/node/id). So yes, we need to rework it …

Since annotation ranges are simply stored with XPath expressions, the simplest thing would be to use those adapted for word IDs (//span[@id=‘w-7x64shgw’) or, if we don’t want that, at least those starting from the content elememts’ enclosing tag. These are already nicely marked by Drupal, so we can use for example //[@id=‘node-2345’]/p[3] or //[@id=‘comment-12345’]/p[3]. Should work, but I have to test still … . And, it would still be a hack. The clean solution is adding “entity type” (node, comment etc.) and “entity ID” columns to the annotation database table. Annotator would get these IDs from the HTML IDs shown above though – I think there’s no other way.

Another aspect of the reworking is to store tags, which is possible with some patches apparently. For marking annotations as public (“shared”) or private, we should look to the Authentication and Permissions plugins of Annotator. Or maybe not, as the public / private info should concern whole tags / codes, not individual annotations.

I have rethought the word ID tagging. Storing annotations as (word ID, tag ID) tuples would work, but creates some problems: Annotator will expect connecting subsequent words that are annotated with the same tag into the same annotation. The same for search result snippets in the quotation manager. Also, the storage would more be like (word ID, tag ID, user ID, created date), so contain quite some redundant stuff that is be shared by several annotated words the way Annotator stores it now. So: we could keep with how Annotator does it now (with the above XPath related changes). And then later, when wanting to make Open Ethnographer change resistant, store the annotated text as both a string (like Annotator does now) and a list of word IDs (in addition). Then, have a function hooking into node saving that adapts all annotations on the node based on the word IDs stored with the annotation. Since we use word IDs only to reposition an annotation, making them an optional add-on, we have some chance to get this added to the annotation Drupal module.

Annotations on changing content is a major unsolved issue for Annotator (see here and here), and they intend to solve it with heuristic repositioning and manual confirmation. Open Ethnographer has the advantage that we can control the text and add word IDs, allowing for a deterministic solution. But as said, we can develop it lateron with this new proposal.

danohu · December 8, 2014, 6:21pm

Thanks for this. Glad to see the tags aren’t really implemented; I’d been scratching my head looking for code that isn’t actually there.

matthias · December 8, 2014, 2:44am

Made this a task.

It was a natural candidate As always, welcome to pick this up if you want, @danohu. You might have a look around first and propose a budget.

danohu · December 17, 2014, 1:04pm

I have written the code for this. I’m not sure what the best way is to share it. It involves:

a plugin for annotatorjs
a small modification to the annotatorjs Store plugin
a modification to the annotation Drupal module
a settings change for the annotator Drupal module

I expect lots of our future work will similarly involve modifications to several of these codebases. So maybe we should copy all of those upstreams into a single git repo? That makes it easier for us to manage our development, at the cost that pushing changes upstream becomes a bit harder.

As for budget…this took me an embarrassingly long time, but mainly just through fumbling to understand the annotator codebase and coffeescript-generated javascript. Would €200 be fair?

danohu · December 17, 2014, 1:13pm

The Right Way to handle this in version control is probably with git submodules. But I’ve not really worked with submodules before, and they seem kinda intimidating

matthias · December 17, 2014, 3:52pm

Great work!

Hey danohu, this is great news, will be testing it out now. I’d say the budget should be about twice what you proposed … you got a major component done here!

If you have some key insights from understanding the Annotator code, welcome to put them into a wiki here.

(Making this your task, because it is now …)

matthias · December 22, 2014, 3:53am

Tested it, see your pull request

I tested it and put the results into a comment on your pull request. It works, and I like it I did not quite expect this task to involve this complex messing around with all three building blocks / modules, plus writing your own Annotator plugin … glad you got that done so well, and we have now our framework to use for storage!

There are some issues left, most probably because I test with a local copy of the edgeryders.eu live site while you develop in vanilla Drupal (which makes total sense). See Github for details.

Another note: It seems a bit too cumbersome to write everything re. the software tasks redundantly on two platforms. Let’s use every platform where it’s best and do a cut like this:

everything until the task begins (budget, who wants it, what is it about etc.), on edgeryders.eu.and
everything else for implementing a task, on Github. This may include pull requests forth and back, using the bugtracker etc.
linking both by just putting the link to a single Github pull request on the edgeryders.eu task

matthias · January 15, 2015, 11:02pm

Done now.

The intended feature basically works now and this task is considered done. Some remaining work on this front has been put into enhancement issues (details).