Introducing Kaggle, an open data playground

Recently I found out about Kaggle, a platform that provides open datasets and a development environment (Python notebooks and API) as a kind of public playground for data science.

People seem to like it – for example, there are 508 user-contributed scripts analyzing the Open Food Facts dataset. So I had the idea that in the future, we might want to upload Edgeryders’ datasets there as well, not just to an archiving place like Zenodo. And then see what analysis people will do on them :slight_smile:

1 Like

Not a bad idea!

For the record, I do not use Kaggle because I had a bad first experience. I wanted to use a nonstandard module (datapackage.py). In a Kaggle kernel you can install from the notebook environment via

!pip install datapackage

Unfortunately, then if you try to upload from your hard drive you get a permission issue:

package = datapackage.Package()
package.infer('/kaggle/input/**/*.csv')
package.descriptor

[...]

DataPackageException: Local path "../input/poprebel-psuedonymized/annotations.csv" is not safe

And that was the end of that. :frowning:

Just a wild guess, it will be because relative paths that involve “..” (go up one directory) are not considered safe in server environments if they can include user input. Because it might allow users to include any file they want, such as to look into secrets stored on the server.

If that message happens without you using the “..” anywhere in your paths, nothing you can do about it other than reporting it as a bug in their server software …

@zmorda knows it well

2 Likes

@matthias It depends on the purpose behind sharing the data publically :

  • If you share it without a specific purpose I don’t think that you can get the insights that you may need, people will just upload the data and make their exploratory data analysis using their local machines or share it on git lab.
  • To drive more interest into your data sets what you can do is launching a competition on the same platform Kaggle for learning or offering some prizes, you can simply give some guidelines to the competitors and you end up by having the results of their analysis that you can build on later.
1 Like