@melancon, @jason_vallet, as you know we committed to creating a tidy and open dataset to upload onto Zenodo at the end of the project. Time to do this. Since the data structure changed a lot in the move from Drupal to Discourse, we won’t do it as at the end of 2016.
The main question is how we do that. Tabular data in JSON? Graph data in GraphML? What do you think?
In a Discourse framework, we need to export:
- A list of topics
- A list of posts
- A list of users
- A list of annotations
- A list of codes
Items 1-3 are opencare’s primary data, 4-5 are secondary data.
For topics, we need to export the following fields:
- ID
- title
- created_at
For posts:
- ID
- topic_id
- username (hashed)
- Full text. I use
cooked
from API calls to the topic; to getraw
we need to call each post individually. - reply_to_post_number
- reply_to_user (hashed)
For users:
- username (hashed)
For annotations:
- ID
- code_id
- quote
For codes:
- ID
- label