Preparing the export of opencare data

@melancon, @jason_vallet, as you know we committed to creating a tidy and open dataset to upload onto Zenodo at the end of the project. Time to do this. Since the data structure changed a lot in the move from Drupal to Discourse, we won’t do it as at the end of 2016.

The main question is how we do that. Tabular data in JSON? Graph data in GraphML? What do you think?

In a Discourse framework, we need to export:

  1. A list of topics
  2. A list of posts
  3. A list of users
  4. A list of annotations
  5. A list of codes

Items 1-3 are opencare’s primary data, 4-5 are secondary data.

For topics, we need to export the following fields:

  • ID
  • title
  • created_at

For posts:

  • ID
  • topic_id
  • username (hashed)
  • Full text. I use cooked from API calls to the topic; to get raw we need to call each post individually.
  • reply_to_post_number
  • reply_to_user (hashed)

For users:

  • username (hashed)

For annotations:

  • ID
  • code_id
  • quote

For codes:

  • ID
  • label

@jason_vallet Where are we with the data export? Just asking so I know whether I should do anything to help tis get to its conclusion.

1 Like

Hi @melancon @alberto @jason_vallet,
is there any news for me about updated data to be used by Tulip?
Are such data uploaded as open data somewhere where i can download it?
Thanks

That depends.

The dashboard is up to date, and you can export from that in Tulip format. BUT five people have denied consent to their content being used in research. This information is stored in the edgeryders.eu database, but not yet available through APIs. Additionally, we probably need to hash the usernames.

So.

For the consent, we (Edgeryders) need to close this issue (ping @matthias and @daniel). Next, @jason_vallet needs to close this one.

For the hashing, we should either disable export of graphs from GraphRyder or hash usernames upon building the graph, so that usernames are neither visualized on GraphRyder nor exportable from it. Guy, Jason, do you want an issue on GitHub?

1 Like

Daniel fixed that now, so @jason_vallet can go on with the next step of fixing this one as outlined by @alberto.