Open ethno data: towards a new Masters of Network?

alberto · June 20, 2020 13:20

Today I had a bit of time, and re-exported the OpenCare dataset to Zenodo using the new export code. This was necessary, since I had found out that the Zenodo data previously published by @melancon and @jason_vallet had not been pseudonymized.

But it has an additional advantage: now we have published three datasets, each relating to an ethno project (OpenCare, POPREBEL, NGI Forward) with exactly the same structure. This makes it in principle possible, and even easy, to treat it as one single dataset. A large one: we are looking at some 8K posts by about 700 participants, with close to 10K annotations. I think it is safe to say there has never been an open ethnographic dataset of comparable size.

Should we do something with it? Should we try to go deeper into abstraction? By this I mean analysis of the structure, rather than the semantics. It would be about trying to recognize patterns in how collective intelligence works in large conversations. Example questions:

Is a post more likely to be annotated if it is in a topic with many replies? Sexier formulation: does interaction lead to more interesting insights for the analysts?
Is a post more likely to be annotated depending on the social network metrics of its author (eg. centrality)?
Can we estimate the likelihood that a post will be found interesting by ethnographers? Application: when there is a lot of content, can we algorythmically build a queue with the most promising posts at the top?

And so on.

It could be a Masters of Networks at some point – provided we are allowed to travel.

Ping@amelia, @ccs, @leonie, @Jirka_Kocian, @jan, @Wojt, @rebecca, @sander, @melancon, @brenoust, @bpinaud

amelia · June 20, 2020 19:47

I train our ethnographers to go where the community points us, so we almost always code higher comment posts first (exception – when they are planning posts authored by community managers and not ‘content’ posts).

This would be interesting, to see what our metric is beyond the question of number of comments (e.g. is there something common to what we consider ‘content-ful’)

I hypothesise that it’s not only/less about the social network metrics of its author and more about the diversity of the people commenting — I usually end up coding posts/finding them interesting when there are at least 3 people actively talking.

I’d definitely be up for comparison – I do think we should get slightly further along on POPREBEL, since it’s not a great example just yet. When we have greater parity between English, Czech, and Polish (which should happen in the next month) it’ll be worth doing.

They also give us 3 different looks in terms of ethnographers, which could be interesting (especially since I have intimate knowledge of all three contexts and could help contextualise any confounding variables). The first is a lone ethnographer (Open Care). The second (POPREBEL) is one lone ethnographer and two pairs of ethnographers working across 3 languages (maybe we should add another ethnographer to English just for the sake of having 2 2 and 2…). The final (NGI) is 3 ethnographers working on a single language.

alberto · June 21, 2020 13:41

Ahh, but a post could be uninteresting, and in this case it will still not be annotated.

This is a testable hypothesis. Yay for conversation-as-data.

melancon · June 22, 2020 12:06

Ah ah! This is an extremely interesting topic. Comparing different networks in quest for similarities or distinctive properties (for each of them) is a question that often pops up – colleagues investigating human trafficking networks are looking at that same issue to figure out whether different network operate based on the same “logic” (see here and here e.g.).

There are several questions hidden in there, some of which relate to the way people interact and reply to one another, and how topics develop; and others on how content is annotated.

I am definitely in for a hackathon. Is there a way we could have such an event although we would not be able to get altogether in a same room. Couldn’t we have a virtual get together between groups located in Bx, Brussels, X, Y, Z, etc. (and rather meet in a Zoom or Google meet room)?

alberto · June 29, 2020 14:03

I think this would be possible, but not an equilibrium.

In my experience of hackathons, the actual work is anyway done with everyone looking at their screen. The motivation for devoting 8 hours solid on a Saturday comes from the fact that people are getting together physically, and hanging out during extended breaks. It takes a very disciplined person indeed to do that on Zoom.

When they are not participating in hackathons, hackers are, well, hacking. All tools are asynchronous and very efficient, interaction is minimal, but there is very little control on what any open source project will get done, and by when.

It would be quite an achievement to redesign the hackathon format so that it can be faster, more social and more interactive than normal open source projects, while still maintaining some of the efficiency of the latter. I imagine such a hackathon lasting 4-5 days, with frequent team check-ins and quite a visible, coherent leadership so that participants can enjoy the feeling of knowing why they are doing what they are doing.

Would it be an option to address it to BOTH ethno-anthro and compsci people? @melancon, @amelia, @akmunk, @jan: would you be open to experiment with something like this?

amelia · June 29, 2020 15:01

definitely!

melancon · July 03, 2020 10:04

Still interested, just as @brenoust and @bpinaud I guess. Reserving 4-5 days in a row (although not full days) might be more problematic. Designing the hackathon, so participants commit to do tasks and deliverables might be a solution. I understand this somehow throws away the spontaneity we appreciate in such gatherings … but just for once …

?

alberto · January 25, 2021 14:50

Reviving this topic in 2021.

What happened: we thought we would do this as a part of MozFest 2021 – happens in Amsterdam, and the city of Amsterdam is a partner in NGI. To everyone’s surprise, MozFest turned us down. So now we are back to the drawing board.

Today we had a meeting of the NGI Forward research task force, and we decided:

Interdisciplinary, in the MoN tradition: ethno/anthro people on one side, datasci people on the other. EDGE to target more the anthro crowd; we could even involve @amelia’s students at OII, or ask the UCL crowd. DeLab to connect us to the datasci crowd, @krystof and @Michal have their own course in datasci.
Hackathon to be online, not physical. Also, to be spread over several days (one week? two?)
Kickoff in late March/early April

Next steps:

Format design (with @hugi, @amelia, @nadia and whoever wants).
Putting up a minisite (deadline: end of February). This contains the info for participants, including links to datasets; facilities for signups and team formation, etc.
Outreach and communication (everybody).

Works?

hugi · January 25, 2021 15:00

I want to link this to the new RyderEx endpoints - two birds one stone. That makes all of our ethnographic data available while also opening up the scope to working on technical experiments that can inform future design of the stack itself.

@amelia, @nadia, @alberto - I propose we have a session to align and get started soon. I propose Tuesday, February 9th.

alberto · January 25, 2021 15:01

Second that.

I would also love to involve somehow @melancon, @brenoust and @bpinaud. The Masters ride again!

melancon · January 25, 2021 15:21

Hi all, thanks for keeping me in the loop. I still tickle with research and data/code – I just finished implementing a sci-kit SVM model this morning (!)

February 9, I can only do 4:30pm or later. I guess Benjamin can only do even later after his workday. I do not know about Bruno.

Guy

hugi · January 27, 2021 17:45

Alright, I propose 16:30 on February 9th then.

@amelia, @alberto, works?

amelia · January 27, 2021 22:06

Should work!

nadia · January 28, 2021 07:58

can you send an event invitation?

hugi · January 28, 2021 10:05

Once everyone confirms the time, yes.

@alberto?

alberto · January 28, 2021 10:17

Sorry! Confirmed,

nadia · January 28, 2021 10:20

confirmed

MariaEuler · February 09, 2021 16:48

Notes:

April
Secound data set from DLab

What to analyse and why?

2 layers: Organisation + Hackathon tracks.
- Hackathon Tracks are there to give people and idea: Usual format: You can either join or propose a track. Only requirement is you do a projects that is going to be represented around a graph. EX: “I want to form a team around ‘Can we see weather the view of technology is more dystopian or utopian?’” You just have to have a question, and then use the methodology to move from here to there.

NEEDS:

We need to define summaries of what the conversations are about, broadly.
We need to have people propose tracks.
We need to prepare the data.
We need to facilitate and connect people and groups running up to the hackathon.
We need to support people with specialised expertise to solve issues.
We need to facilitate the hackathon.

Hackathon Facilitation:

3 days 28th-30th April
Wednesday Kickoff event (3 h online synchronous in the morning to set things up ( Starting 10:00-12:30) + break out rooms stay open throughout the 3 days with suggested working hours )
Friday Final presentation ( 3h online synchronous for a final presentation in the afternoon + open end)
Small groups break out sessions
Subcategory (see overweb challenge: The Overweb Challenge - Edgeryders)
Tell.form to introduce yourself and your question? (See Connect! - Introduce yourself and your project). Alberto writes text and introductions, Maria sets up subcategory and tell form.
Use Gather ? Hugi and Amelia explore via Babel meeting test.

Internal Research Question:

How should the “service for academic research" GraphRyder look like? (RezNet)

bpinaud · February 09, 2021 17:01

Wednesday kickoff event (April 28th) and Friday for final presentation (April 30th) instead?

MariaEuler · February 09, 2021 17:09

Of course you are right. Had not adjusted that. Edited now