Starting the cleanup of POPREBEL

So, I presented our paper to ICQE22, and I am happy to report it was well received. I was paired with David Shaffer, who is seen as the Godfather of quantitative ethnography, so I had a very full room and lots of questions. I did miss @Nica and @Jan and @Richard to take the more anthropological questions (“wait, Gramsci was no anthropologist!”), but it went very well.

Now it’s time to cleanup. On my end – besides, of course, the final report – the needed activities are:

  1. Create the Zenodo upload for the data (in Tulip form) used for the ICQE22 paper.
  2. Export all POPREBEL interview data, and put them in a Zenodo upload. This is going to be a bit of a hassle, because my current export script does not cover gender (nor should it), and because we have three corpora. I think what I will do is to export each (single-language) corpus separately, and manually modify the datapackage.json metadata file so that it has resource entries for three annotation files, three codes files, and so on. t should not be a problem, because codes have unique IDs and so people could decide to treat them as a unified corpus, as well as maintain them separate.
  3. Export the codebook in human-readable form, and upload it to Zenodo.
  4. Submit the Applied Network Science paper. @bpinaud, have you finished revising the text?

To do all this, I need you guys to stop tweaking the codes. Or, at least, to stop for now: you can resume in 2023, but at that point we will have consistenc between the POPREBEL report and the POPREBEL data. We need to make sure that the former points to the version of the data I create now, and not to the dataset in general, since the latter will probably be updated later.

Does this work?


Yes. If you’d like, we can plan a zoom call (even with @melancon ) to do a review together.

Great. What about Monday?

after 4pm

I have assigned to all interviews a common Discourse tag: #ethno-poprebel-all-interviews. This way, we can call both each separate language corpus (via #ethno-rebelpop-polska-interviews etc.), and view all POPREBEL interviews as one large corpus. The former thing makes more sense for data analysis, but the latter is perhaps best to build the codebook, since many codes recur across the three corpora.

The POPREBEL codebook is, as you know, a deliverable. In order to build it, the process is:

  1. First access the table rendering of the codebook relative to #ethno-poprebel-all-interviews, and any creator: (note: for whatever reason, this link – though correct – returns a 404 error. So, please copy-paste it into the browser’s address line, or just head over to the Codes page in Annotator and manually select the ethno-poprebel-all-interviews tag, Creator => Any creator, Order => Name and View => Simple table

    We could also render it as a list, but the problem there is that children codes are nested under parent codes.

  2. Copy the table onto a Google Doc, or whatever word processor you like, and add the logos etc.

@SantosCardonaPR, you have been most involved in this, do you want to give it one last scan? Just follow the link above. And of course, @Nica’s nod of approval is important to me.

Once you give the nod, someone in EDGE will create the deliverable document proper.

@alberto – that page is actually not accessible to me. It says it either does not exist, or is private. As soon as you update the permission on that, I will take a look.

See my previous post, I corrected it.

Calendar invitation sent. @Jan, @Richard, you are also invited too but do not feel obliged, I think that @bpinaud, @melancon , @Nica and I can do this.

@alberto @bpinaud I just received the zoom invite, I’ll be there. I teach until 4pm, I might arrive a bit late.

Hello @rebelethno, I have now finished exporting, cleaning and uploading the data from our interviews. You should cite the dataset in your academic work: Please note that this is a different dataset from the one used for the ICQE paper.

I still need to add some authors, though: specifically, the ethnographers who did the interviews and the coding of the Czech and German parts of the corpus. This means @Djan, @SantosCardonaPR, @jitka.kralova. @SZdenek and @Jirka_Kocian … am I forgetting anyone?

For each of you, I need: full name, affiliation and ORCI number.

1 Like

I am on line since 4pm

Sorry, I was programming and lost myself! I am also online, and I see you logged in, but I guess you have (rightly) walked away from your computer.

Hi Alberto, I am including my information below:

Santos Rivera-Cardona
PhD Student
Rutgers University

1 Like

@SantosCardonaPR thanks for that. Also, I found two more pairs of duplicate codes, created by @Richard and @Jirka_Kocian. I imagine you will want to merge them, right? Can you do it? The first pair in particular looks quite well connected and might show up on the radar.

  1. children
  2. civil rights

Hi Alberto,

I just merged them! Please let me know if there are any order codes we need to work on!

Have a lovely rest of your week!

OK, but now you assigned all POPREBEL annotations to the parentless code (9701), whereas it is 4285 that comes with the path from the Z category… is that intentional?

From what I understand, @Wojt or @Jan can correct me if I am wrong, the Z category is not supposed to be visualized; hence, we have been trying to eliminate them. I can erase the Z category and leave “children” on its own for visualization tho. What I did is basically leave the code “children” on its own because ultimately, the Z category would be deleted. Does that make sense?

Anyway, I just changed the parent code and made the code “children” that contains POPREBEL annotations fall into the Z-Y-X category. Now the code is 9701.

@alberto here it is:
Zdeněk Sloboda
Charles University in Prague, Institute of International Studies, Faculty of Social Sciences
research assistant
ORCID: 0000-0001-9721-7983

Ping this. Are we ready to submit? Needs to be done in the next three weeks, or not at all.

@alberto here is mine:

Jitka Králová
UCL School of Slavonic and East European Studies, PhD student
ORCID: 0000-0002-7513-6346