Research methodology: the Spot The Future data model and workflow

alberto · May 7, 2014, 1:22pm

We had a hugely productive meeting of the research team today. We write here its documentation.

Goals

Spot The Future research activities are about re-using the online conversation on the Edgeryders platform research data. The interface between conversation and research is technological. The Edgeryders website runs on a technological stack, and we need to think on how to enrich that stack so that:

research can be conducted easily and cheaply
without being in the way of conversation
while being transparent about our research activities and methodology
with an eye on long-term stewardship both of primary (posts, comments) and secondary (ethnographic coding) data.

We came up with the following solution:

Ethnographic research

Coding on Edgeryders

Ethnographic coding happens on the platform itself. “Coding” in the ethnography context means assigning a keyword to a selection of text, typically a sentence or paragraph. For example, suppose you want to assign the tag “cookies” to the sentence “This is a sentence.”. Coding on Edgeryders would result in assigning RDFa attributes to the sentence, like this:

This is a sentence.

This tagging system can be used by everyone with the “content manager” role, but is for now just meant for being used by @Inga Popovaite of course. Usage goes like this:

Creating a new tag.
1. Go to the CKEeditor styles list and click the "Clone" operation for a style whose name starts with "eoe_". (That means "Edgeryders Online Ethnography" and is a namespace we invented to mark these tags.)
2. In the style form, adapt the "Administrative Title" field to be different from the original you cloned. For consistency, usually keep it the same as the "Title" field more below.
3. Click "Edit" behind the "Machine name" field in the top of the form, and adapt it to start with "eoe_" followed by a short version of the Administrative title field, using only lowercase letters, digits and "_".
4. Adapt the "Title", "Property" and "Resource" fields according to their field help texts.
5. Click "Save".
6. If needed, you can click "Edit" again to adapt all fields of your tag, except the "machine name" one. To adapt this, you have to clone your style rule and afterwards delete the original. Also note that these changes only affect new tagging activity. Old taggings use the old values.
Tagging some text.
1. Click "Edit" for the piece of content you want to tag, or to proceed tagging (usually posts and comments).
2. Select "Text Format: Filtered HTML + Tagging" below the editor.
3. Select a piece of text that you want to tag.
4. Click the "Styles" dropdown in the left of the editor toolbar and there, click the tag you want to use.
5. When mousing over a tag for a second in the "Styles" dropdown, a tooltip with the full name will appear. This helps with long names that are not visible completely. The same tooltip also appears when mousing over tagged text in your editor. This lets you know which tag you used there, as only types of tags (like "topic", "place" etc.) will be optically distinguishable. (Means, optical rendering of tags will not regard for differences in the "resource" attribute.)
6. Repeat as above, and when you're done select "Text Format: Filtered HTML" again and click "Save". Switching the text format back is important, because else the original authors (who can't use the tagging text format) would be disallowed from editing their own content. The tagging itself is kept of course, the difference is just that the editor will miss the list of tagging styles, and tags are not rendered visually.
7. If you forgot to reset the text format before saving, you can of course edit the content again and fix this issue. It is also ok to keep the text format at "Filtered HTML + Tagging" as long as you are in your editing session (that is, using the "Save and Edit" button for saving your work every now and then).
Removing tags from text.
1. In the tagging editor, mark text from which you want to remove all tags.
2. Click the "Remove Format" button in the toolbar (looks like T_x). This will remove all formats from the text, both all tags and visual formatting.
3. Re-apply tags you want to keep. This is relevant in cases where multiple tags had been applied at once.
4. Re-apply visual formatting.
Power user mode. If anything goes really wrong with tagging, it can be fixed up again in the HTML source mode. For that click the "Switch to plain text editor" link below the tagging editor, and when done click "Switch to rich text editor". This mode is also great for removing tags while keeping other tags and visual formatting that is applied to the same text.

Keep in mind that, for transparency reasons, users are allowed to see the tagging of their own content by navigating to their own posts and comments, clicking on “Edit” and then selecting “Switch to plain text editor”. They can also see individual tags when “accidentally” mousing over tagged text, which will make a tooltip to appear, showing the tag’s name.

TODO. The following functionality is not yet implemented, but will be added soon:

A way to mark what nodes (including their comments) have been tagged already.

Analysis with a CAQDAS tool

@Matthias will write a script that exports STF content in a format that can be imported with an open source CAQDAS tool. Like WEFT-QDA, but there are better alternatives (so far CATMA seems best, and it’s also web-based). Inga can then explore the data using that tool.

Network analysis

Most of the network analysis will use tools external to Edgeryders. The interface is the Drupal module Views-Datasource (already installed and enabled). This module allows to create queries to the Edgeryders database and export them in JSON format. @Alberto and @brenoust will work together to create the queries relevant here. Benjamin will then write the scripts that transform the JSONs into networks.

We wish to explore the relationships between social interaction (as represented by a network of comments) and semantics (as represented by ethnographic coding and perhaps output from natural language processing algorithms) in the STF conversation. To do this, Benjamin will write scripts to retrieve the tags from the full text of posts and comments.

inga_popovaite · May 12, 2014, 7:47am

Sounds good. Also, as to the QDA software, I a leaning towards using RQDA as WEFT crashes all the time and MINER freezes while running them on Wine. Plus, I have worked a lot with R for quantitative analysis before.

matthias · May 13, 2014, 12:21am

Use the tool of your choice

I had similar experiences with WEFT under Linux, even when running it inside VirtualBox. So I guess it’s not due to Wine, or Linux, but WEFT itself. Welcome to use any tool you like – RQDA looks good since it’s free software, has an SQLite file format (so I can export to it relatively simply), and the combination with R seems interesting (have worked with R briefly myself; maybe we’re lucky and @brenoust knows R as well and would want to do network analysis with it? Ethnographically augmented network research sounds interesting … and we could even do with just one data export then, for both purposes …).

I was looking for a web-based CAQDAS tool because it would allow closer collaboration and better sharing with others in Edgeryders. But that seems like far fetched – let’s keep that to be done one day with our Drupal-integrated collaborative sensemaking software.

noemi · May 12, 2014, 3:09pm

Just to thank you

Guys this is great, the online ethnography will kick *** Reading and trying to follow, have tested the tagging of content, just to see if I can do it.

@Inga Popovaite I guess you’re still in the process of reading stuff? how do you work? do you start creating tags as you come across them, or do you get an overview first -then come up with preliminary list, then read again and refine it, add more etc? Looking forward to this!! If you need help with anything or find content, don’t hesitate to ask.

Well done everyone.

PS nice meeting you @brenoust!

brenoust · May 13, 2014, 3:02pm

Thanks

Excellent and efficient setting @Matthias,

Well, I have my habits working network analysis under python and C++, that’s why the JSON exports will be practical. We could do some R as well, but the JSON format is flexible enough to be handled in R as well (see RJSON package). I mostly use R when I need to do some regressions or other statistical reasoning which I can’t quickly prototype under python. Anyway it’s kind of a geeks’ debate, no matter the language, the questions on the data and the answer we bring will matter

PS: Nice to meet you too @Noemi!

alberto · May 13, 2014, 3:21pm

No need for R now

@Inga Popovaite has her hands full with the ethno coding; @brenoust has a network analysis to deliver. If we need to do statistics along the way, we’ll cross the bridge when we find it. And yes, JSON is quite easy to handle.

inga_popovaite · May 13, 2014, 5:58pm

yes, i am still in the process of reading it first and getting preliminary tag set - it would save time and I will not have to add/delete tags all the time.

@Matthias, is it possible to keep 2 tags for the same content? For example, if the longer paragraph is coded as XYZ, but within it is a sentence or two that would go with a tag/code NVY, can I keep both?

matthias · May 13, 2014, 8:01pm

Yes, you can

Every tag translates to its own some text tag, so you can select some text, tag it, select a part of that text again, tag it again. Resulting in both tags being applied in a nested way so that the middle part has two tags applied at the same time, like this:

some text part of that text some text

This works with any nesting level (applying 5, 10, 15 tags to the same piece of text). But it will quickly become hard to work with then. I will also have to see how to render tags visually in a way that nested tags are shown appropriately. Maybe background colors with transparency, which should work for about 3 tag levels.

inga_popovaite · June 1, 2014, 7:44pm

@Matthias, I have couple of questions:

i guess this probably for the future, but is it possible to make tags more visible, not just plain simple small flag when you put pointer over the text, but maybe different colors or…i mean, i can work (and am working) with what is here, however, when i apply several tags i need to be sure if all of them are there, because the “flag” displays just the last one i put on. how to I check that the others are still there?
how long will it take you to write a script to export tagged text for future analysis in RQDA? I plan to be done with tagging by middle of the week, if everything goes right.

matthias · June 3, 2014, 12:07am

All tags are triangles now.

Thanks for asking. As I had promised to do the tag rendering and it slipped off my list somehow … . Ok, so here it goes.

You (and all other content managers and admins) will now see little colored triangles “” at the start and “” at the end of a tag, both when viewing content and when editing it (in the “Filtered HTML + Tagging” mode only). For example I tagged myself here, or compare Noemi’s post here, which already has tagging applied. The color of the triangles will indicate tag type (topic, action, place etc.) and when hovering over it, you see again the tooltip with the tag name. When you add several tags to the exact same piece of text now, several of these triangles will appear the start and end. So by hovering over each of them, you can see the names of all the tags you applied.

If the colors-and-tooltip combination is not sufficient for visual indication of how you tagged things up, I can adapt that to your wishes easily now, including: (1) colors per tag instead of per tag type, (2) other symbols instead of just triangles, per tag type or tag, (3) image icons instead of the triangle symbols, (4) text color, background etc. of the tagged text itself, though I guess that will make it hard to read and can also lead to weird combinations when tagging text with multiple tags.

@Alberto, you gotta come over here and look at what your ethno-tagging editor prototype has become I kinda surprised myself with this solution … it’s still a bit of a hack but it was simple and … umh … simple is beautiful in this case. You can see how it is done in this CSS file. (If you want to adapt the CSS, remember to flush Drupal’s caches after saving the CSS or you won’t see any effects.)

alberto · June 3, 2014, 1:25pm

Great work!

Very good work, @Matthias and @Inga. I am impressed.

brenoust · June 2, 2014, 9:36pm

Different taggins?

Hi @Matthias!

I’m progressing on the tagging so far, but I need to be sure that I’m on the right path.

Please let me know if my approach is right, I’m looking for spans within the text that have no particular class, but have the attribute “property” beginning with “eoe:”. What I find after that is namely the category of tagging? (I found so far in the posts ‘topic’ and ‘action’). Then the tagging is encoded within the attribute “resource”, after “eoe:”. I guess the attribute “Title” is only for interface usage purpose (maybe it does not belong in the data??)

Is it the right way?

The main question is nothing else is stored behind this “eoe” thing?

Now, is it possible for me to create a “view” in which I can associate “resource”, “property”, as I do with comments and posts, or should I recreate it on the fly from what I obtain? Just to be clear with what I have.

Thanks a lot!

Ben

matthias · June 2, 2014, 10:59pm

Yes, tag type and actual tag.

Your understanding of the RDFa tags is quite right. Specifically, the title attribute is just a human-readable short description of the tag and encodes no additional information (so, nothing that you need to care about in network analysis). I’ll copy in here the current attribute field help text from the CKEditor styles creation form so you better see the make-up / syntax of the different fields:

Element: For ethnographic tags, enter "span".
Title: HTML title attribute to apply to the element; will appear as tooltip when mousing over markup in the editor and in the page on the frontend.
For ethnographic tags, put in tag type, a colon, and the full tag name so you can recognize it later. For example, “Topic: International labor migration”.

Property: RDFa property attribute. For ethnographic tags, (1) let it contain the tag type, (2) use the "eoe:" prefix, (3) else use only lowercase characters and "-", for example: "eoe:topic" or "eoe:place"
Resource: RDFa resource attribute. For ethnographic tags, (1) let it contain a short / abbreviated name of a specific tag of the type you put in "property", (2) use the "eoe:" prefix, (3) use square brackets, (4) else use only lowercase characters and "-", for example: "[eoe:intl-labor-migration]" for a topic tag."
Class: [...] For ethnographic tags, you can leave this empty. Your tags will still be colored for you.

“The main question is nothing else is stored behind this “eoe” thing?”: Not sure I get this question. eoe is just a namespace I invented because it’s a technical requirement in RDFa attributes. It is just meant to be memorizable and means “Edgeryders Online Ethnography”.

“Now, is it possible for me to create a “view” in which I can associate “resource”, “property”, as I do with comments and posts, or should I recreate it on the fly from what I obtain?”: The resource attribute is the actual tag, meaning that the property attribute’s tag type information is not needed to identify a tag. Having both attributes in place is again just a technical detail to match the RDFa standard, which is IMHO a nice way for embedding the tagging in a web standards compliant way (making it safe to serve it as part of the regular content, as we do now).

So I understand your question as “Can I create a view that lists me all nodes where tag eoe:some-tag-name appears in?”. Unfortunately that seems not possible in Drupal without developing a custom module; Drupal is not well-suited to search based on DOM content, and the only thing coming close to what we need here (index_htmlattr) is only for Drupal 6. What seems potentially possible however (using views_xml_backend) is views like “list me all ethnographic tags appearing in node with ID <parameter here>”. I assume that this is not really helpful, since analyzing which tags appear in a certain XML piece is just a simple task (using XPath or whatever you like for that) in your script as well. But in case you want to experiment with views_xml_backend features, I’ll gladly install it for you here.

brenoust · June 3, 2014, 8:42am

Thanks for this very detailed answer!

So there is no special span class attribute that identifies ethnographic coding spans but “eoe” means “Edgeryders Online Ethnography” and it is a marker sufficient enough to assess that it concerns only ethnographic coding.

Thanks to your system I will always find attributes property and resource to build the association groups/coding, so I will build the “view” on the fly

Alright, thanks again!

inga_popovaite · June 3, 2014, 10:57am

alphabetic order?

thank you, @Matthias, now it is way easier to see and make sure that I put all the tags on a piece of text that I wanted to. But I have another question: is it possible to do that the tags would all be displayed in alphabetic order in that drop-down list? older tags are in the alphabetic order, however, I am creating more as I go, and they all appear in the end of the list. and it is harder to find them fast afterwards.

matthias · June 3, 2014, 1:30pm

Ordered.

Ok, all content of the styles dropdown is now in alphabetic order. (Or rather, ordered by machine names, the stuff like eoe_place_africa … which seems to give the same order always but due to the eoe_ prefix also keeps all your tags together without getting interspersed with regular text styling tags).

Re. the script for exporting to RQDA, it takes probably 2-3 days. But I’ll have to do one other urgent Edgeryders task today / tomorrow first

inga_popovaite · June 4, 2014, 8:51am

@Matthias Thank you for alphabetic order! another question: is it possible somehow to edit machine name (eoe_something_something) when you want to edit a tag? when i do cloning manually, time to time i forget to change the machine name and later there is no way of editing it, so several times i had to delete the tag all together and start anew. it just takes extra time

also, for script for RQDA: do you need me to be completely done with tagging for writing it or it does not matter?

matthias · June 4, 2014, 10:38am

No, you’ll need to re-create tags, unfortunately.

I’m afraid there’s no way to edit the machine name; it’s intentionally this way in Drupal, since it’s used as a non-changing reference point where other parts of the software can refer to. Since nothing refers to the tag machine names, I can safely rename them in the database, but it’s not possible in the edgeryders.eu user interface. So you can send me a bulk list of machine names to correct at times, but I guess it’s more direct to delete and re-create the tag as you do. By re-creating with the exact same property and resource attribute values, it will result in the same tags in texts afterwards. And of course if you used it for tagging already, deleting the tag in the CKEditor Style Rules section does not delete the tagging within your text, so no need to redo that. But I guess you figured that out already

For my work on the RQDA script, it does not matter if you’re done with tagging or not.

inga_popovaite · June 4, 2014, 11:39am

thanks for info! yes, i figured out most of things already. then my question - when will you have time to do the script for RQDA? would it be possible by this weekend? thank you very much for all your help!

inga_popovaite · June 6, 2014, 2:53pm

Another technical issue: I did not notice for quite a while that eoe_topic_social_media had attribute title Environmental campaign - so I finally fixed it in CKEditor.But older text that was tagged with it, still show ‘environmental campaign’ when i drag mouse over. however, if i understand correctly, when exported all these tags will be under one name “social media”, right? or should i go and manually recode everything?