Preparing the export of opencare data

alberto · December 20, 2017, 10:37am

@melancon, @jason_vallet, as you know we committed to creating a tidy and open dataset to upload onto Zenodo at the end of the project. Time to do this. Since the data structure changed a lot in the move from Drupal to Discourse, we won’t do it as at the end of 2016.

The main question is how we do that. Tabular data in JSON? Graph data in GraphML? What do you think?

In a Discourse framework, we need to export:

A list of topics
A list of posts
A list of users
A list of annotations
A list of codes

Items 1-3 are opencare’s primary data, 4-5 are secondary data.

For topics, we need to export the following fields:

ID
title
created_at

For posts:

ID
topic_id
username (hashed)
Full text. I use cooked from API calls to the topic; to get raw we need to call each post individually.
reply_to_post_number
reply_to_user (hashed)

For users:

username (hashed)

For annotations:

ID
code_id
quote

For codes:

ID
label

melancon · December 30, 2017, 9:13am

@jason_vallet Where are we with the data export? Just asking so I know whether I should do anything to help tis get to its conclusion.

federico_monaco · January 7, 2018, 9:45am

Hi @melancon @alberto @jason_vallet,
is there any news for me about updated data to be used by Tulip?
Are such data uploaded as open data somewhere where i can download it?
Thanks

alberto · January 7, 2018, 9:27pm

That depends.

The dashboard is up to date, and you can export from that in Tulip format. BUT five people have denied consent to their content being used in research. This information is stored in the edgeryders.eu database, but not yet available through APIs. Additionally, we probably need to hash the usernames.

So.

For the consent, we (Edgeryders) need to close this issue (ping @matthias and @daniel). Next, @jason_vallet needs to close this one.

For the hashing, we should either disable export of graphs from GraphRyder or hash usernames upon building the graph, so that usernames are neither visualized on GraphRyder nor exportable from it. Guy, Jason, do you want an issue on GitHub?

matthias · January 10, 2018, 1:28pm

Daniel fixed that now, so @jason_vallet can go on with the next step of fixing this one as outlined by @alberto.