AMA - Data Openness and Covid

Thanks! Yes I will listen in. Curious to see the discussion.

Ok, but the question is: why would that be unclear? If we are talking about apps, the raw data are timestamped to the millisecond. All you need to do is pick a method for aggregation (example: “daily, from 0:00:00 to 23:59:59”) and document it in the metadata.

In general, in this story I am fascinated by how things that do not look like rocket science, like the aggregation, as you say, become seemingly impossible. In Italy there is quite some pressure to publish. @giorgia.lodi, @aborruso, why, oh why do they not publish?

And, more technically: which data are still not published, despite pressure? Here I am asking mostly Giorgia and Andrea, and referring to Italy, to begin with.

OK, Andrea, we agree on that. But the COVID mess really brought it home: we seem very, very far from this “natural” way of handling public data. COVID data are urgent; there is a high demand and a lot of attention (our mutual friend Piersoft made a dashboard that got tens of millions of views); no privacy problem at all, since all data are aggregated. And they are “classic” open data, data gathered for administrative reasons that can yield extra benefits when shared.

And yet, no joy. So yes, things should work just like you say… but what is your opinion on why they are not?

First of all, what an amazing story that opens to so many issues also ethical let me say. To answer to this question: Probably yes. Take into account that knowledge is power and having informed citizens implies having less control on them. Open Data enables a wider knowledge and possibly citizens who can reason and develop their own opinions. It is exactly the contrary wrt any populism that speaks to the first impression of the people who do not know about complexities of the problems of the society.

Second question. Yep indeed. Opening up data allows more people to control that. These situations can be overcome. BTW: I am now in an research Institute.If the government asks me for the catastrophic scenario I might want to show that but facts guide the Science and only that.

Third question: Because we have many data of poor quality and this prevents to see a real impact of its reuse. In addition, many important data with a strong and direct impact is still closed I am afraid. This means that people cannot appreciate the potential of opening up data

Many people share personal data for inconsequential services or comfort all the time, but it seems that for something like contract tracing for the virus or in general anything that is not an app people choose for their own comfort there is a lot more scrutiny used.

Do you think that it is important to always take these opportunities to have those conversations about privacy etc. or do you think that sometimes necessary data-sharing is inhibited by it becoming an exercise in the discussion around privacy that people avoid in so many relevant parts of their lives where the data-sharing would be far less necessary but is done for more nonchalant?

We discussed the German App for covid-tracing a bit more here: Germany Introduces Corona Virus Tracing-App: Why aren't people more sceptical?

@Asjad here is a civic hacker based in Vienna. I just met him yesterday on Twitter, but am in awe of his great dataset of EU COVID data at the NUTS2 and NUTS3 level.

image

Asjad, what is your Europe-wide impression? Do you see many disparities in open data strategies, or rather similarities?

I tell you the Italian story. For days some Italian regions complained for the government decision to put the region in “red”. Not sure you know that based on some data (mostly closed to everybody) the government takes decisions on closing or opening up social and public activities. They claimed that they were orange or even yellow (better colors because most of the activities are tot open). And they asked to be moved at least in orange area. Well, we do not know precisely if the first government decision that limited so much the freedom of enterprises and citizens is correct or not. With more open data probably that strange situation with continuous controversies could be easily and quickly fixed!

2 Likes

Giorgia, I think this story needs to be told, most readers will not know it…

I would definitely reuse it to link it to other data! It is not the single dataset that unlocks the value but the ability to link it to other data that can allow us to truly benefit of the shared and open knowledge

Yep you are right!

Ciao Matteo, I love to see you here!
Simple questions are the most complex.

My steps:

  • In this year, here in Italy, I have too often heard decision makers say phrases that showed that they did not have the necessary basic knowledge on these issues. These issues are important, they must be a piece of their profession. Data literacy by default;
  • I repeat myself a little. Building processes where open data is one of 100 outcomes; do not build an isolated process designed only for publishing open data. Because otherwise it is not sustainable and in most cases, sooner or later it will stop or produce dirty data;
  • Build stable and clear procedures that allow and define the modalities of interlucution between the parties: decision makers, researchers, associations, companies, citizens. In this year it was very bad and limiting (in Italy), to be treated like a child to whom the grandfather explains how to walk. But I can walk and I’m 47 years old
  • Implement the rules and procedures that have existed for years in Europe and in many other countries. The legislation on open data and the re-use of public sector information. These laws, rules and procedure cannot automatically produce outcomes, because it is probably not considered a strategic fact. It is possible that the website of the Italian Ministry of Health has a license that does not allow the publication of any derivative work and that all that remains is to hope that someone can change it only for personal availability.

When the EC/JRC starting putting together NUTS2 level COVID-19 maps, I thought that they would start releasing regional data as well. They clearly have it and I would have made a lot of sense for research work. But suddenly they announced in October 2020 that they would actually scale down the whole thing and only release aggregated bi-weekly statistics. This was really surprising and in the opposite direction I thought they would go. I think besides the countries holding back on data, the EU is also not doing much. It might be on purpose.

Additionally, as i mention in the document, all countries now have fancy dashboards, but how to get to the data behind them, is a massive challenge.

1 Like

For everybody: in Italy we decided to take decisions on COVID restrictions based on data. That’s very important btw. The idea is that Italian regions are color-coded according to three main colors. Yellow: you can do some things but not others (e.g., you cannot go to the restaurants during the evening); orange (there are some restrictions. but still there is some freedom of movement); red (you essential are quite stuck at home with some exceptions). Well, for a week one Italian region was in red area because based on some data that the central government collects, the resulting color was red. They complained a lot since they claimed it was not true. Nobody, especially not citizens, could verify that. Where is the truth? Opening up data allows us to better control these cases and probably avoid unnecessary controversies.

And, sorry to ask so many questions, but I am so exasperated… let’s talk about vaccination data.

In Belgium, where I live, the situation is this. The federal public health institute, called Sciensano, published data on vaccine doses administered (“jab in arms”): Belgium COVID-19 Dashboard - Sciensano

But there is a problem. There are NO data on the vaccine doses delivered by the Pharma companies. This is not good at all, because – as in much of Europe – the vaccination campaign in Belgium is going very slow. A common explanation has been to blame the companies for not delivering fast enough. High-profile Belgian MEP Guy Vershofstadt has pointed out that the European Commission has signed contracts that did not guarantee the speed of its supplies.

But is it so? Without data on deliveries, we do not know. A Belgian hacker called Joris Vaesen has made his own dashboard, and that one does have data on doses delivered:

And it seems that no: the curve of doses delivered rises faster than that of the jabs in arms. It seems Verhofstadt is wrong: the blame does not lie with the Commission, but with Belgian health authorities.

But the embarassing bit is what Vaesen has to use as his sources: news articles and press releases from companies. Of course: there are no open data! And yet the data exist, because the government buys the damn things, and keeps them in warehouses. Why can we not have them?

Another example I can give you is that Eurostat has a massive database on weekly deaths. It is available for most countries and is the main source of excess death graphs one sees online. But Germany is missing from Eurostat even though the German statistical agency publishes this information online. So I don’t get why cant this be added to the centralized database. Unlike tracing apps data, this information is already in the public domain. There is a need for data solidarity in Europe.

First of all that data is not open data and this is something we should tell the people about because we need to let people understand what open data is truly! Secondly, yep we asked formally and informally that data and other data they have to be released as open data in order to control the decisions that are taken and that influence so much our freedom. BTW: asked for more detailed data at the level of province but still after one year we have just total cases. It is not enough :slight_smile:

1 Like

Can I ask you to expand on this, Andrea?

You seem to be saying: Italian public administrations are bad at open data because they are bad at orderly processes. Which is to say… they are bad at mostly everything. A public administration is supposed to be a bureaucracy! If it cannot do process, then what it is that it can do?

If that is true, we have a deeper problem than open data: we have a capacity problem, a governance problem. We are completely helpless, the system kind of limps along when there are no external shocks, but it cannot manage anything like COVID, or climate change, or social upheaval. This would be very bad.

Is this what you are saying? @giorgia.lodi, you are a former civil servant. Any thoughts?

Haha, Andrea, Giorgia and I know each other from a mailing list called Spaghetti Open Data, where a recurring joke is “let’s take to the street and break those dashboards!” (because people want the data behind them. We used to run tutorials on scraperwiki and such…

Uh? Did you ask why that is?

Who do I ask? :slight_smile: In any case this project I did was additional work on top of the existing research commitments. But since I have several projects dealing with NUTS level analysis, the main idea was to preserve the data before it disappears from the internet. Something I have seen happen many times in the past.

  1. open data does not include personal data, with some exceptions to balance other rights (e.g., transparency). However, there are tons of data owned by PAs that can be definitely returned back to the society because it does not include personal data (e.g., environmental data. Why are still so closed?) :slight_smile: And personal data should be seamlessly shared among institutions for specific institutional purposes.

  2. from my perspective this is still an open issues. Finding the right balance is something we need to work on, all researchers, PAs and also the private sector. It is not easy: aggregate means loosing information. probably we need to investigate more how to apply strong privacy preserving techniques that are available at the state of the art. I still see this as a very big obstacle because skills are missing, in particular in public sector. That’s way I keep on saying that a much larger collaboration with research institutions and PAs for instance is key to make progresses and break the barrier of the fear that usually exists when opening data that may have some impact on privacy rights

  3. oh no no, I’ve been studying since years fully decentralized approachs. There is no need to centralize the data in 2021!! For instance paradigms as Linked Data and projects like solid demonstrate that it is possible :slight_smile: