Implement basic exporting of codings to a CAQDAS tool

danohu · December 3, 2014, 3:48pm

Looking at this from the task list: “(6) Implement exporting of selected codings to a CATMA file. Doing so should allow to leave code hierarchies built inside CATMA intact, just adding more codings.”

And from the requirements:

Export to QDA software. Ideally, the codings of one ethnographer could be saved into a file (also including codings from others that this ethnographer did take over into her code hierarchy). The requirement is:
- Ability to filter which content will be in the export. Making a list of groups is granular enough. Non-relevant content (like administrative posts etc.) in these groups can be easily ignored as it will not be coded either, and can also be deleted in the export's target QDA software if needed.
- Ability to filter which codes will be in the export. This would default to "all own codes, including shared codes to which one has subscribed", but options would allow to limit this further. Since it is rarely needed that one wants to exclude own tags, implementing this part can wait or even be discarded.
- Export for download in a format of one open source QDA software.
- Real-time syncing to a web-based open source QDA software. [optional] This can be the same QDA software as the one for which a downloadable file is offered.

Installing CATMA

Download the zip of CATMA 3.2 from here. Unzip it. Run (java -jar catma/catma3.jar)

CATMA format

CATMA works across 3 files:

source file -- [docname].txt. May also support formats other than plain text (?)
annotation file -- [docname_structure.xml]. This contains the character offset
user file -- [docname_user.xml]. This seems to contain user preferences, such as what color each tag should be displayed

you can see examples of this in the CATMA tests – try rose_for_emily.

I’m going to start working on a limited version of this. Namely, a function to generate CATMA-compatible XML from a piece of tagged content. Things I’m not going to do for now:

merging with existing CATMA tags
using only a selection of tags
providing more than a minimal UI

Once we get the basic export right then either I’ll come back to those additional requirements, or we can spin them off as separate tasks.

matthias · December 4, 2014, 3:30pm

All fine, go ahead

Welcome to work on this, @danohu! I just converted this to a task, and made it for implementing the basic exporting only. And as you proposed, I have added separate tasks for selecting which codes to export and merging into existing tag hiearchies in CATMA.

Budget-wise, would 400 EUR work for this? (A rough guess based on a comparable exporting job to RQDA which I did for the Open Ethnographer prototype back in summer; was to SQLite, so more fiddly complex, while this XML thing is more structurally complex.)

Note on the CATMA format: From looking through the CATMA test documents that you referenced, it seems that their TEI XML format contains both code definitions and codings in one XML file [example]. The rose_for_emily_structure.xml file seems to demonstrate a secondary use of TEI XML, namely to externally define what are the paragraphs of a text.

matthias · December 6, 2014, 1:49am

Changes! Changes!

Sorry @danohu for interrupting you, but I had to change the software design … tried eComma thoroughly, and it does not seem to be the right base software for us I will detail this in the software design wiki after sleep.

So, could you implement this based on Drupal module annotator and the associated storage module annotation? Tell us how much work you had to redo and we can find a solution re. the budget …

matthias · January 7, 2015, 2:10am

Nothing more constant than changes …

Hey again @danohu, I’d like to inquire if you have invested a lot of effort into this task already resp. what the progress is?

Because after working in detail with CATMA today, I question the wisdom of exporting to CATMA at all. I looked again through all free software QDA applications I could find, installed and tested some, and I like RQDA best now. It’s the application that we also used as export target for the Open Ethnographer prototype, and I still have the script I used back then. So except you’re far in with this task, maybe I can take it over and implement an export to RQDA instead?

My reasons for RQDA:

Not web-based. In text analysis, the delays in CATMA are very annoying for productive work. I originally searched for a web-based tool to allow live syncing, but did not realize how annoying the delays are (and the inability to use it offline).
SQLite file format. Which is great to handle for exporting to / downloading / archiving. Unlike CATMA's multi-file XML stuff.
Single-user data structures. What annoyed me most in CATMA is its uber-complex UI and concept structuring, coming mostly from trying to support multi-user / collaborative work, and from bad UI design. We completely don't need these multi-user features, since in our case Open Ethnographer is for the collaborative part, and text analysis (everything after the exports) is every ethnographer's own business. RQDA has a much simpler UI because it targets such a single-user analysis scenario.
Better handling of query complexity. In CATMA, they have a kind of proprietary query language that however misses some basic features (for example, there is no way to search for all tags at once, so no quotation manager feature; and even if there was, the search result presentation does not provide enough context to read vertically).

I am not against having CATMA export at some point in the future, but for now I think RQDA is much more appropriate.

alberto · January 7, 2015, 8:45am

RQDA was used in Spot The Future

If memory serves, it is @Inga_Popovaite's weapon of choice.

matthias · January 7, 2015, 11:10pm

Yes, you’re right

RQDA was indeed proposed by Inga for Spot The Future. So I don’t expect that she’ll dislike this change, I asked nonetheless to be polite It would be a pity though if Dan had invested a lot for CATMA exporting already. But even then … CATMA is really not what I believed it is, and software development is an iterative process anyway.

danohu · January 8, 2015, 10:33am

That’s fine, @Matthias. Being

That’s fine, @matthias. Being willing to throw away a bit of work at this stage is a good thing; we’re all learning about the domain, and shouldn’t be constrained by what our less-well-informed past selves thought was a good idea.

My existing work on CATMA is at GitHub - danohu/openethnographer at _catma_wip

There’s probably not anything worth rescuing there, so please go ahead and set up your RQDA export process from scratch.

matthias · January 11, 2015, 1:04am

Taking over

Going to work on this now in the RQDA variant. You’re right, there are times when it’s good to waste some work, even though it’s still bad. It’s called software development after all, not production Since I messed this up, be sure to invoice this work proportionally to completion status. And I’ll take a look at your code and see if it can inspire me somehow for the RQDA variant.

matthias · January 20, 2015, 3:29pm

Basic exporting to RQDA is done now.

See commit f7d15e1.