Open ethno data: towards a new Masters of Network?

Today I had a bit of time, and re-exported the OpenCare dataset to Zenodo using the new export code. This was necessary, since I had found out that the Zenodo data previously published by @melancon and @jason_vallet had not been pseudonymized.

But it has an additional advantage: now we have published three datasets, each relating to an ethno project (OpenCare, POPREBEL, NGI Forward) with exactly the same structure. This makes it in principle possible, and even easy, to treat it as one single dataset. A large one: we are looking at some 8K posts by about 700 participants, with close to 10K annotations. I think it is safe to say there has never been an open ethnographic dataset of comparable size.

Should we do something with it? Should we try to go deeper into abstraction? By this I mean analysis of the structure, rather than the semantics. It would be about trying to recognize patterns in how collective intelligence works in large conversations. Example questions:

  • Is a post more likely to be annotated if it is in a topic with many replies? Sexier formulation: does interaction lead to more interesting insights for the analysts?
  • Is a post more likely to be annotated depending on the social network metrics of its author (eg. centrality)?
  • Can we estimate the likelihood that a post will be found interesting by ethnographers? Application: when there is a lot of content, can we algorythmically build a queue with the most promising posts at the top?

And so on.

It could be a Masters of Networks at some point – provided we are allowed to travel.

Ping@amelia, @ccs, @leonie, @Jirka_Kocian, @jan, @Wojt, @rebecca, @sander, @melancon, @brenoust, @bpinaud

3 Likes

I train our ethnographers to go where the community points us, so we almost always code higher comment posts first (exception – when they are planning posts authored by community managers and not ‘content’ posts).

This would be interesting, to see what our metric is beyond the question of number of comments (e.g. is there something common to what we consider ‘content-ful’)

I hypothesise that it’s not only/less about the social network metrics of its author and more about the diversity of the people commenting — I usually end up coding posts/finding them interesting when there are at least 3 people actively talking.

I’d definitely be up for comparison – I do think we should get slightly further along on POPREBEL, since it’s not a great example just yet. When we have greater parity between English, Czech, and Polish (which should happen in the next month) it’ll be worth doing.

They also give us 3 different looks in terms of ethnographers, which could be interesting (especially since I have intimate knowledge of all three contexts and could help contextualise any confounding variables). The first is a lone ethnographer (Open Care). The second (POPREBEL) is one lone ethnographer and two pairs of ethnographers working across 3 languages (maybe we should add another ethnographer to English just for the sake of having 2 2 and 2…). The final (NGI) is 3 ethnographers working on a single language.

1 Like

Ahh, but a post could be uninteresting, and in this case it will still not be annotated.

This is a testable hypothesis. Yay for conversation-as-data. :slight_smile:

2 Likes

Ah ah! This is an extremely interesting topic. Comparing different networks in quest for similarities or distinctive properties (for each of them) is a question that often pops up – colleagues investigating human trafficking networks are looking at that same issue to figure out whether different network operate based on the same “logic” (see here and here e.g.).

There are several questions hidden in there, some of which relate to the way people interact and reply to one another, and how topics develop; and others on how content is annotated.

I am definitely in for a hackathon. Is there a way we could have such an event although we would not be able to get altogether in a same room. Couldn’t we have a virtual get together between groups located in Bx, Brussels, X, Y, Z, etc. (and rather meet in a Zoom or Google meet room)?

1 Like

I think this would be possible, but not an equilibrium.

In my experience of hackathons, the actual work is anyway done with everyone looking at their screen. The motivation for devoting 8 hours solid on a Saturday comes from the fact that people are getting together physically, and hanging out during extended breaks. It takes a very disciplined person indeed to do that on Zoom.

When they are not participating in hackathons, hackers are, well, hacking. All tools are asynchronous and very efficient, interaction is minimal, but there is very little control on what any open source project will get done, and by when.

It would be quite an achievement to redesign the hackathon format so that it can be faster, more social and more interactive than normal open source projects, while still maintaining some of the efficiency of the latter. I imagine such a hackathon lasting 4-5 days, with frequent team check-ins and quite a visible, coherent leadership so that participants can enjoy the feeling of knowing why they are doing what they are doing.

Would it be an option to address it to BOTH ethno-anthro and compsci people? @melancon, @amelia, @akmunk, @jan: would you be open to experiment with something like this?

1 Like

definitely!

Still interested, just as @brenoust and @bpinaud I guess. Reserving 4-5 days in a row (although not full days) might be more problematic. Designing the hackathon, so participants commit to do tasks and deliverables might be a solution. I understand this somehow throws away the spontaneity we appreciate in such gatherings … but just for once …

?

1 Like

Reviving this topic in 2021.

What happened: we thought we would do this as a part of MozFest 2021 – happens in Amsterdam, and the city of Amsterdam is a partner in NGI. To everyone’s surprise, MozFest turned us down. So now we are back to the drawing board.

Today we had a meeting of the NGI Forward research task force, and we decided:

  1. Interdisciplinary, in the MoN tradition: ethno/anthro people on one side, datasci people on the other. EDGE to target more the anthro crowd; we could even involve @amelia’s students at OII, or ask the UCL crowd. DeLab to connect us to the datasci crowd, @krystof and @Michal have their own course in datasci.
  2. Hackathon to be online, not physical. Also, to be spread over several days (one week? two?)
  3. Kickoff in late March/early April

Next steps:

  1. Format design (with @hugi, @amelia, @nadia and whoever wants).
  2. Putting up a minisite (deadline: end of February). This contains the info for participants, including links to datasets; facilities for signups and team formation, etc.
  3. Outreach and communication (everybody).

Works?

I want to link this to the new RyderEx endpoints - two birds one stone. That makes all of our ethnographic data available while also opening up the scope to working on technical experiments that can inform future design of the stack itself.

@amelia, @nadia, @alberto - I propose we have a session to align and get started soon. I propose Tuesday, February 9th.

3 Likes

Second that.

I would also love to involve somehow @melancon, @brenoust and @bpinaud. The Masters ride again!

2 Likes

Hi all, thanks for keeping me in the loop. I still tickle with research and data/code – I just finished implementing a sci-kit SVM model this morning (!)

February 9, I can only do 4:30pm or later. I guess Benjamin can only do even later after his workday. I do not know about Bruno.

Guy

2 Likes

Alright, I propose 16:30 on February 9th then.

@amelia, @alberto, works?

2 Likes

Should work!

can you send an event invitation?

Once everyone confirms the time, yes.

@alberto?

Sorry! Confirmed,

confirmed

Notes:

  • April
  • Secound data set from DLab

What to analyse and why?

  • 2 layers: Organisation + Hackathon tracks.
    • Hackathon Tracks are there to give people and idea: Usual format: You can either join or propose a track. Only requirement is you do a projects that is going to be represented around a graph. EX: “I want to form a team around ‘Can we see weather the view of technology is more dystopian or utopian?’” You just have to have a question, and then use the methodology to move from here to there.

NEEDS:

  • We need to define summaries of what the conversations are about, broadly.
  • We need to have people propose tracks.
  • We need to prepare the data.
  • We need to facilitate and connect people and groups running up to the hackathon.
  • We need to support people with specialised expertise to solve issues.
  • We need to facilitate the hackathon.

Hackathon Facilitation:

  • 3 days 28th-30th April
  • Wednesday Kickoff event (3 h online synchronous in the morning to set things up ( Starting 10:00-12:30) + break out rooms stay open throughout the 3 days with suggested working hours )
  • Friday Final presentation ( 3h online synchronous for a final presentation in the afternoon + open end)
  • Small groups break out sessions
  • Subcategory (see overweb challenge: The Overweb Challenge - Edgeryders)
  • Tell.form to introduce yourself and your question? (See Connect! - Introduce yourself and your project). Alberto writes text and introductions, Maria sets up subcategory and tell form.
  • Use Gather ? Hugi and Amelia explore via Babel meeting test.

Internal Research Question:

  • How should the “service for academic research" GraphRyder look like? (RezNet)

Wednesday kickoff event (April 28th) and Friday for final presentation (April 30th) instead?

1 Like

Of course you are right. Had not adjusted that. Edited now :slight_smile: