As per our data management plan, we publish the raw OpenCare dataset on Zenodo. In the light of the GDPR, @markomanka and I agreed that it would be a good idea to pseudonymize the updated dataset before we publish it for the second and final time. Handles on Edgeryders are not “pseudonymous enough”. Question to @markomanka: are edgeryders user IDs good enough?
In practice, this will be a bunch of JSON files. @matthias, do you have suggestions?
The numeric user IDs, as used on Discourse, are not suitable, as the mapping between usernames and IDs is publicly available in the Discourse API (without login – example).
But you can just use randomly generated strings or numbers as IDs.
I’m not aware of specialized tools for this. The fastest way to get it done will depend on what process you use for export (you export directly from Discourse? or from a tool like Edgesense with its own secondary database?).
So if you have a script already that creates the JSON output, you could add a function there. If you use a standard tool to export from your database to JSON, it may be faster to create a field with a random ID to the user records there.