Talking GDPR and data protection in research in Copenhagen

alberto · November 15, 2017, 10:32am

Great seminar by @akmunk on the GDPR. He refers to TANT-Lab, but he is saying the same things that we are saying in Edgeryders: embrace it, overshoot its requirements. TANT-Lab (and Edgeryders) tradition is to be transparent about methods.

Some new things I am learning:

You need not only to establish who is responsible for the data, but also who is allowed to treat the data.
Reuse of datasets is now not automatically allowed; if you collect data for project A and want to use them for project B, you need to think through the data protection/user rights implications of that.
Anonymised data are not subject to the GDPR. But anonymising is harder than it seems. The text of a Facebook comment can reveal the identity of its author via a simple Google search.
It’s super-important to distinguish between direct and indirect collection. In anthropology, direct collection is interviews: they need consent. Indirect collection is Twitter mining: it does not need consent. But it does need information to the Twitter users, unless the data are for for research and it is “unreasonably burdensome” to inform every single user. A nice discussion on what a “reasonable effort” is follows. For example, in projects where data is collected from Facebook groups Anders writes to group admins and asks them for their opinion, and then posts on the group informing members about the study.
It’s also super-important to be able to say whether the data are self-published. If they are, no consent is needed. This covers blog posts etc.
Data security requirements always apply. But in practice you can be sued for the consequences of data breach. The consequences of Edgeryders suffering a data breach on email addresses are, in practice, unlikely to cause any major damage.

Some cases:

Facebook profiles are not callable through APIs. Maybe we should not consider this as a public data source. If we do not, we need informed consent. However, a court in Denmark has sentenced that iff you have over 400 friends, you cannot reasonably expect your posts to stay private, even if you restrict them to “friends only”.
Facebook’s closed and secret groups are also not callable through APIs. Again, you can use API permission as a criterion for data being public. This would arguably exceed the GDPR’s requirements.

Edgeryders is a bit different from TANT-Lab in that we actually DO ask for consent. Our story is:

Edgeryders is a collective blog. We are publishing text, and collecting otherwise only one piece of personal data: email addresses. We do not enforce a real name policy.
To protect people from accidentally revealing stuff about themselves that they do not want to be public, we make them go through a consent funnel before we allow them to post.

With all this, how to make a data policy for Edgeryders?

I would say points 1 and 2 above are the core of the data policy. We also add:

Information on data storage and encryption methods. @matthias, can you please remind me how we use encryption, and for what?
A person responsible for data (Matt).
One or more people authorised to treat the data in research projects.
A process to detect data breaches. I think with open source projects such as Discourse this will be: pay attention to signalled vulnerabilities. Do we have a way to detect we have been hacked?
A process to prevent data breaches. I guess this will be: promptly install security patches on our stack.

Notice that the GDPR applies to Edgeryders even if we do not do research.

matthias · November 15, 2017, 7:26pm

SSL encryption (“HTTPS”, provided by LetsEncrypt) is used for the transfer of all data (incl. the request URL) between our servers and web visitors. This is enforced, as the http:// version forwards to the https:// version.

Passwords are saved salted and hashed (“one-directional encryption”), so a data breaches do not reveal password. This even applies if people use very simple passwords which have “well-known hashes”: because passwords are salted before hashing, a different hash than the well-known one is stored in our database.

We also use SSL when sending e-mails where available, but since that is not end-to-end encryption (for protocol reasons) it is not worth mentioning.

In Discourse, technically any staff member (moderator or admin) is “allowed”, as they can look up people’s e-mail address in the profile.

No, how would that even work. There are only best-effort methods to detect that. Right now I’m not putting any effort into such checks, or tools to help with that.

This is done (fully automatically) for all software in operating system provided packages (all of Ubuntu).

For other software (basically Discourse, ISPConfig, Matrix server, WordPress, own tools) we have to install updates manually, and a better process for that would not hurt.

alberto · November 15, 2017, 7:49pm

Thanks man, very useful.

A process could be like this:

Find the mailing lists, RSS feeds or whatever the makers of Discourse, ISPConfig etc. use to inform users of the release of security patches. Subscribe to them and funnel them onto a fixed place, like an email address or RSS aggregator or whatever.
Set up (strict) watch shifts. A watch shifts means simply this: whoever is on shift must check the email or RSS feed every 48 hours. Of course, almost always it will be empty.
If a new security patch has been released, take steps. In most cases this will mean informing Matthias. Exceptionally (for example: Matthias is unavailable) it might mean reaching out to the community, explaining what the problem is and how we are dealing with it.

The [quote=“matthias, post:2, topic:7662”]
There are only best-effort methods to detect that.
[/quote]

The whole GDPR compliance is best-effort. You need to have processes to deal with things; but it’s up to you to decide which processes, and they should be commensurate to the cost of implementing them and to the potential damage. In our case, the potential damage is very limited.

anique.yael · November 17, 2017, 12:24pm

Thanks for this update @alberto and the clarifications @matthias. Super interesting and obviously really important. Thank goodness for @Geminiimatt’s presentation at OpenVillage and the session on the GDPR for context.

Two questions, perhaps naive:

is this post enough in terms of beginning to log our best efforts and data security policy or do we need to do more?
do we already have an ethics process drafted/ published for our research?

alberto · November 17, 2017, 7:45pm

No. Writing a proper data protection policy is on my list.

There is a lot of ethics stuff, but I do not have it – you should ask @lucechiodelliub. What we do have is a data management plan. We also registered with the ICO, the British data protection regulator, and went through its self-assessment tool (results). I need to update this material in the wake of the change in our base technology (Discourse, not Drupal), country of incorporation (Estonia, not UK) and other minor changes.

alberto · November 20, 2017, 7:25am

Also on our data protection policy: we don’t use session replay services, which turn out not to be secure at all.

anique.yael · November 20, 2017, 6:28pm

Great. I’ve asked @lucechiodelliub for ethics processes documentation etc.