Ethno data in POPREBEL: which standards?

alberto · June 17, 2019, 10:53am

Hello all, I am working on the project’s data management plan (deliverable 2.3, in case you are wondering). Data re-usability and interoperability under the FAIR principle recommends storing data in standard form; but, as we know, ethnography has not yet developed an open standard for data exchange. We have to fall back on a generic standard.

To a first approximation, I see two candidates.

Open Knowledge’s Data Package. This has the advantage of being far and away the simplest.
The guidelines of the Text Encoding Initiative. It is an XML standard aiming to make any text interoperable with any other text. This is achieved by tagging parts of the text: for example, when the text contains parts that are in a language other than the main language of the text, these parts are bracketed by a foreign tag.

John eats a <foreign xml:lang="fr">croissant<foreign> every morning.

I am tempted to discard the TEI as horribly expensive and irrelevant.

It is horribly expensive because someone would have to study closely all the text and mark it with all sorts of tags. For example, questions are to marked with the q tag. There is some conversion software: the main one seems to be Oxgarage, which is actively maintained and can do conversions from certain types of files to TEI. It is, however, very far from intuitive. So, at the very least there would be quite some work to set Oxgarage up and shout at it until it works.

And it is irrelevant because it was started in 1994 (with the semantic web fad raging on), and the website seems to indicate that it never really caught on. So, I do not think we would get much extra interoperability for all that work.

@amelia, @Richard, @Jan: do you agree? Is TEI-compliance a big requirement in your fields?

Richard · June 19, 2019, 5:40pm

Hi @alberto. TEI-compliance is not an issue as far as I am aware.

amelia · June 20, 2019, 12:00pm

Not an issue afaik either

alberto · June 20, 2019, 12:53pm

Ok, TEI is officially dropped.