Analysis by gender: need data

Hello all, I am working on the code to make co-occurrence networks by gender of the informants.

In order to test my code, I need the file we talked about, in one of these two formats:

name, gender 


user_id, gender

This can be a spreadsheet with two columns, if it makes it easier for you.

For now, I have received the link to a Google sheet by @Maniamana: I have no access, but I requested it. I imagine it just contains the Polish informants. This means I need @jitka.kralova and @Djan to supply me with the equivamlent data for the Czeck and German informants.


Hi Alberto,
Here is the link again with granted access, but I am still working on creating memberships for those research participants who are not on the platform yet, as their interviews are not yet uploaded. It will take a moment, because it will take at least 3 hours of mundane work to put it up and save all the logging data for the future purpose of uploading the interviews in the conversation-like format, so I will be able to do it today in the afternoon the earliest.

The link: Polish Interviews Gender - Google Tabellen

Thanks. It looks like, of these 59 informants, only 17 are already on platform. It seems strange, can you check that I am correct? The ones I cannot find are listed below.

PL01Zuzanna20b not found.
PL06Marcel20b not found.
PL12Maks30a not found.
PL13Paweł30b not found.
PL15Dawid30b not found.
PL16Lena20b not found.
PL22Paulina50b not found.
PL23Krystyna50a not found.
PL24AnnaMaria30a not found.
PL26Ania30b not found.
PL28Nadia20a not found.
PL29Kamila20b not found.
PL30Bruno20b not found.
PL31Alicja30a not found.
PL32Marek30b not found.
PL33Irmina30a not found.
PL34Tadeusz40a not found.
PL35Konrad30b not found.
PL36Mateusz40b not found.
PL37Jola50a not found.
PL38Anita20b not found.
PL39Wiktoria30a not found.
PL40Izabela60a not found.
PL41Arek30a not found.
PL42Maciek30a not found.
PL43Klaudia30b not found.
PL44Martyna30b not found.
PL45Olek20b not found.
PL46Franek20a not found.
PL47Magda30a not found.
PL48Jacek30a not found.
PL49Maciej60a not found.
PL50Antoni40a not found.
PL51Jerzy20b not found.
PL52Halina30a not found.
PL53Ewa20b not found.
PL54Aga30a not found.
PL55Ola30a not found.
PL56Dominika30a not found.
PL57Sandra40a not found.
PL58Flora30b not found.
PL59Mirosław40b not found.

Are you using the tag ethno-rebelpop-polska-interviews?

Also, as I mentioned above, on the platform are only the interviews that are coded. By Monday we will have ver half of the interviews on the platform, BUT when it comes to creating users - as I mentioned above - it is not at all done, bc for practical reasons I wasn’t now creating new member accounts without having an interview to upload. That’s why I mentioned the last week that we need to wait anyway for the Polish data to be on the platform etc.

Anyway, I am at a conference right now, so that’s all I can do at the moment :confused:

Hi Alberto,
Will send you the spreadsheet next week.

No, there is no need to use a Discourse tag. My script takes your list and looks into Edgeryders for each name in the list. If it finds it, it retrieves the user ID for later use. I do it with API calls, but, given the time and patience, one could do it manually. For example, the first name in the above list of missing persons, PL01Zuzanna20b, yields:

Not a problem for me: even if the names were there, we could not do the analysis until the coded interviews are also there. Anyway, my script seems to work fine, so that means when, once the data are all available, I can induce networks quite fast.

Ping @Nica: just checking that we are on schedule.

Thanks Djan!

Well, than I do not understand what’s the issue, bc that interview is there on the platform:

The issue is consistent naming. That interview is with a user called 01PLZuzanna20b, but in your spreadsheet you wrote instead PL01Zuzanna20b. Can I ask you to go over the spreadsheet and check the names?

:sweat_smile: :sweat_smile: oh God, sure thing!

on second thought, Richard just informed me that he had already uploaded it :slight_smile:

Great, but where?

German genders.pdf (33.3 KB)

@alberto Can you please somehow export participants (names or IDs) from the Czech forum prior to the interviews? I will make a spreadsheet and add their gender there (based on the gender they are using in Czech language).
Or do I have to do it manually? I have somewhere a spreadsheet of contributions to the Czech forum made by Jirka in 2021 I could use that and go contribution by contribution and extract the names.

Can do. What I need to know is how, precisely, you store the data. In other words, I need to know

  • the forum category, or categories (there are at least two that I can see), containing the corpus; or
  • the Discourse tag, or tags, associated to the corpus.

What I can see from the contributions it is:

“Wellbeing in Europe” / “V Česku” / ethno-poprebel

SO everything is in V Česku - Edgeryders? It does not look like it, there has been no movement for over a year. Do you maybe mean

(see what I mean…)

The protected category are the interviews from Jitka (data set B1 and B2). She will make a list on her own.

I thought @Nica wanted us to make a similar list for the platform contributions to both Czech and Polish forums PRIOR the interviews (prior Mannia, Jitka and Djan were on board), that means posts from 2019 to 2020(1) = data A.

If I am wrong and we do not need that. Then we can discard my request.

I think we do not. @Nica, is that your final word?

@SZdenek and @alberto: I only want the gender visualizations for the interviews, not for the platform posts, because it was my understanding that there was no fully reliable way to ascertain the gender of the platform participants.