AMA - Data Openness and Covid

Yep you are right!

Ciao Matteo, I love to see you here!
Simple questions are the most complex.

My steps:

  • In this year, here in Italy, I have too often heard decision makers say phrases that showed that they did not have the necessary basic knowledge on these issues. These issues are important, they must be a piece of their profession. Data literacy by default;
  • I repeat myself a little. Building processes where open data is one of 100 outcomes; do not build an isolated process designed only for publishing open data. Because otherwise it is not sustainable and in most cases, sooner or later it will stop or produce dirty data;
  • Build stable and clear procedures that allow and define the modalities of interlucution between the parties: decision makers, researchers, associations, companies, citizens. In this year it was very bad and limiting (in Italy), to be treated like a child to whom the grandfather explains how to walk. But I can walk and I’m 47 years old
  • Implement the rules and procedures that have existed for years in Europe and in many other countries. The legislation on open data and the re-use of public sector information. These laws, rules and procedure cannot automatically produce outcomes, because it is probably not considered a strategic fact. It is possible that the website of the Italian Ministry of Health has a license that does not allow the publication of any derivative work and that all that remains is to hope that someone can change it only for personal availability.

When the EC/JRC starting putting together NUTS2 level COVID-19 maps, I thought that they would start releasing regional data as well. They clearly have it and I would have made a lot of sense for research work. But suddenly they announced in October 2020 that they would actually scale down the whole thing and only release aggregated bi-weekly statistics. This was really surprising and in the opposite direction I thought they would go. I think besides the countries holding back on data, the EU is also not doing much. It might be on purpose.

Additionally, as i mention in the document, all countries now have fancy dashboards, but how to get to the data behind them, is a massive challenge.

1 Like

For everybody: in Italy we decided to take decisions on COVID restrictions based on data. That’s very important btw. The idea is that Italian regions are color-coded according to three main colors. Yellow: you can do some things but not others (e.g., you cannot go to the restaurants during the evening); orange (there are some restrictions. but still there is some freedom of movement); red (you essential are quite stuck at home with some exceptions). Well, for a week one Italian region was in red area because based on some data that the central government collects, the resulting color was red. They complained a lot since they claimed it was not true. Nobody, especially not citizens, could verify that. Where is the truth? Opening up data allows us to better control these cases and probably avoid unnecessary controversies.

And, sorry to ask so many questions, but I am so exasperated… let’s talk about vaccination data.

In Belgium, where I live, the situation is this. The federal public health institute, called Sciensano, published data on vaccine doses administered (“jab in arms”): Belgium COVID-19 Dashboard - Sciensano

But there is a problem. There are NO data on the vaccine doses delivered by the Pharma companies. This is not good at all, because – as in much of Europe – the vaccination campaign in Belgium is going very slow. A common explanation has been to blame the companies for not delivering fast enough. High-profile Belgian MEP Guy Vershofstadt has pointed out that the European Commission has signed contracts that did not guarantee the speed of its supplies.

But is it so? Without data on deliveries, we do not know. A Belgian hacker called Joris Vaesen has made his own dashboard, and that one does have data on doses delivered:

And it seems that no: the curve of doses delivered rises faster than that of the jabs in arms. It seems Verhofstadt is wrong: the blame does not lie with the Commission, but with Belgian health authorities.

But the embarassing bit is what Vaesen has to use as his sources: news articles and press releases from companies. Of course: there are no open data! And yet the data exist, because the government buys the damn things, and keeps them in warehouses. Why can we not have them?

Another example I can give you is that Eurostat has a massive database on weekly deaths. It is available for most countries and is the main source of excess death graphs one sees online. But Germany is missing from Eurostat even though the German statistical agency publishes this information online. So I don’t get why cant this be added to the centralized database. Unlike tracing apps data, this information is already in the public domain. There is a need for data solidarity in Europe.

First of all that data is not open data and this is something we should tell the people about because we need to let people understand what open data is truly! Secondly, yep we asked formally and informally that data and other data they have to be released as open data in order to control the decisions that are taken and that influence so much our freedom. BTW: asked for more detailed data at the level of province but still after one year we have just total cases. It is not enough :slight_smile:

1 Like

Can I ask you to expand on this, Andrea?

You seem to be saying: Italian public administrations are bad at open data because they are bad at orderly processes. Which is to say… they are bad at mostly everything. A public administration is supposed to be a bureaucracy! If it cannot do process, then what it is that it can do?

If that is true, we have a deeper problem than open data: we have a capacity problem, a governance problem. We are completely helpless, the system kind of limps along when there are no external shocks, but it cannot manage anything like COVID, or climate change, or social upheaval. This would be very bad.

Is this what you are saying? @giorgia.lodi, you are a former civil servant. Any thoughts?

Haha, Andrea, Giorgia and I know each other from a mailing list called Spaghetti Open Data, where a recurring joke is “let’s take to the street and break those dashboards!” (because people want the data behind them. We used to run tutorials on scraperwiki and such…

Uh? Did you ask why that is?

Who do I ask? :slight_smile: In any case this project I did was additional work on top of the existing research commitments. But since I have several projects dealing with NUTS level analysis, the main idea was to preserve the data before it disappears from the internet. Something I have seen happen many times in the past.

  1. open data does not include personal data, with some exceptions to balance other rights (e.g., transparency). However, there are tons of data owned by PAs that can be definitely returned back to the society because it does not include personal data (e.g., environmental data. Why are still so closed?) :slight_smile: And personal data should be seamlessly shared among institutions for specific institutional purposes.

  2. from my perspective this is still an open issues. Finding the right balance is something we need to work on, all researchers, PAs and also the private sector. It is not easy: aggregate means loosing information. probably we need to investigate more how to apply strong privacy preserving techniques that are available at the state of the art. I still see this as a very big obstacle because skills are missing, in particular in public sector. That’s way I keep on saying that a much larger collaboration with research institutions and PAs for instance is key to make progresses and break the barrier of the fear that usually exists when opening data that may have some impact on privacy rights

  3. oh no no, I’ve been studying since years fully decentralized approachs. There is no need to centralize the data in 2021!! For instance paradigms as Linked Data and projects like solid demonstrate that it is possible :slight_smile:

I read that this was based on that region’s (Lombardia: ~10 million residents) failure to update the clinical condition of infected people. They would heal, but still be counted as infected (and infectious) by the model, which consequently decided for the code red.

Yes, it beats Belgium, where we have no idea which data and which models are being taken into account by our Comité de Concértation. it feels very random, with decisions like “we are re-opening hairdressers, but trimming or shaving men’s beards is still forbidden”. Like Andrea said, it does feel like being infantilized.

I just tried Twitter:

image

This thing, I have understood in these months, before I did not have the awareness, it is the thing.
Public data, epidemic governance and democracy.
It is a huge cultural and political issue, this we are talking about is not a matter of obsession with data, this is a daily and everyone’s issue.
The effort we must make, each in its own context, is to emphasize how important it is.

It is a dog that bites its own tail. If science is not independent, but data and also the documentation of the processes that produce them are available, someone else will be able more easily to monitor both institutions and research centers, and the bar phrases that we too sometimes express about too specialized topics.
Civil society and NGO that deal with these issues must be committed to teaming up, they must be prepared and ready.
We have done a good job of keeping attention on the issue, creating a large and above all new network.
But it is still too little. We will dedicate the resources we have available to do better and more in 2021.

For the very first long months of the pandemic, I saw people looking at the number tables, as if they were the Olympic medal table.
An unnecessary attention to the reading of numbers that in most cases did not add any information.
But I’ve also seen a lot of new attention and evolution. Friends, relatives, colleagues, journalists have had an approach with the subject of data and with the responsibility that one has in managing and telling them, which I had never seen.
For example, I have seen journalists asking for the publication of raw and quality data with a maturity that was not there in 2019.
I saw the data of the largest survey done every year on the quality of life in Italy, published in a github repo.
I have seen the Italian civil protection publish data in ways that will leave a mark. Less than this can no longer be done.

I made it long. There is a low overall maturity, but it is clear that this is a great opportunity. For everyone.
There is a lot of work to do

1 Like

I could tell you that if the data is scarce it is difficult to answer this questionI could tell you that if the data is scarce it is difficult to answer this question :wink:.

Transparency is a basic principle of democracies.
The availability of public data is necessary in order to involve the scientific community in managing the epidemic.
In the absence of transparency, every conclusion becomes contestable on the scientific level and, therefore, also on the political level.

Wow! What an important and controversial topic. We are speaking about statistics? Or do you mean the data coming from the tracing app? To the best of my knowledge the data of the app is under the control of the Italian Ministry of Health. The app in Italy was designed to take into account the privacy since its design. I do not see so many issues and honestly that data would be so beneficial to trace how outbreaks can be formed and where they are (in schools? A great debate in Italy and we still do not know!). At the end of the story, I am not sure the potential of the app was understood neither by people nor by institutions. Indeed it did not work, also because such an innovative element was put in place in a very old system that does not reason in terms of data for instance. So the app and all its data are both resulting useless at the time being.

1 Like

Yes, of course. I would not want to open the personal data can of worms.

But even that is weird. The data from the Italian COVID app are on GitHub, but not on dati.gov.it. I guess this is because it is the company that made and maintains it that publishes the data, not the public sector. Capacity problems…

These are both important conclusions! Glad we could clear that up.

@aborruso, @giorgia.lodi, an hour has passed, and you have been taking a lot of questions. You too @Asjad, thanks! I think we can end here for now. Very likely some more questions and remarks will come in the next few days, maybe we will ask you to come back and address them.

Many, many thanks! This was fun :clap: :clap: :clap:

True Alberto, that’s was the case. And BTW: after 1 year of pandemic do we have a shared semantics in Italy and in Europe? hmmm…

We need to contrast these situations and from my point of view open data could help a lot!

Thanks for inviting me to join this interesting discussion!

1 Like

It’s not so simple. The license of ISS website is wrong, it is wrong for this context: it’s a CC BY-NC-ND.
Here the basis must be open science and instead we have the block of derivative works.

In those data we have too much aggregate data on many issues. We want them to be less aggregate, in compliance with the rules on the protection of privacy.
There are techniques to disaggregate and anonymize the data and whoever manages this must have staff who know these techniques.