Understanding the overlap of coding ontologies in the different POPREBEL language fora (long, lots of pictures))

When I last attended a ethnographers’ meeting, I heard @Jan and others wondering about the similarities and differences between the codes co-occurrence subnetworks induced by the different language fora in POPREBEL. After all, most people do not speak Polish and Czech and Serbian and English, so those fora are populated by different participants. It would not be surprising if the debates in different languages went in different directions. Do they?

I can see two approaches to this question. One is mathematical; the other is visual.

Mathematically, it’s not rocket science. You can think of the codes co-occurrence graph as a vector. Every element of the vector is an edge between two codes. If the two codes are disconnected (there is no edge), the corresponding element of the vector is 0. If they are connected by co-occurrence in k posts, the corresponding element of the vector is k. Given this, you can compute a scalar measure of distance (Euclidean, or cosine) between two vectors representing two fora, for example the Polish and the Czech. That would give you a number. To know whether this number is “high” (very dissimilar networks) or “low” (very similar networks), you can then compare the distance between the two observed vectors with the distance between either of them and a random network with the same degree distribution. If the former distance is much smaller than the latter, then you are have a proximity which is difficult to explain by random chance. This could be a sign of convergence between the two debates, but also a signature of the coding team’s work.

The visual approach simply draws the network. I find it potentially more generative for an anthropologist, because you can then burrow into which codes are shared across the different fora, and reflect on what this commonality might mean. Here is the POPREBEL codes co-occurrence network for k > =3. Color coding maps to the language forum, as follows:

Red => Czech
Blue => Polish
Green => International
Orange => Serbian
Grey => nodes that belong to more than one forum

I built this version of the network representing the co-occurrences in different language fora as different edges. So, if code1 and code2 co-occur once in the Polish forum and twice in the international forum, I represent this with one blue edge with k=1 and one green edge with k=2. Graphryder would only draw a single edge, with k = 1+2 = 3. This means that there are no grey edges, though there is a fairly large group of grey nodes at the center. We can see it better by hiding the edges:

Another trick is to show the edges, but this time coloring them by interpolation of the color of the extremes. This means that, if both extremes are red, the edge will also be red; but if the edge connects a red node with a gray one, the edge will start off red and turn to gray as it approaches the grey one. Doing this shows a network of highly connected grey nodes at the center.

You can toggle these views on and off by playing with the buttons at the bottom of the node/link diagram window in Tulip.

The grey might be deceiving, because some nodes are mostly in one forum, with only few links to others. To explore this, I suggest turning off the color interpolation, zooming in if necessary, and turning on the “Highlight node neighborhood” interactor, on the top of the node-link diagram window. Hovering over nodes reveal their connections; clicking on them locks that view in place, while you can move your mouse elsewhere. This method reveals, for example, that corruption is incident to edges in the Czech, Polish and international forum; on the other hand, polarization political is incident only to Polish debate edges (at least for k>=3).

Finally, I decided to make a separate subgraph which consists only of the codes that are shared across two or more fora, and of the edges connecting them. It consists of 232 nodes and 5,645 edges. If we only take its subgraph where k >= 3 we are left with 180 nodes and 636 edges. It looks like this (I made the Polish edges a darker shade of blue so it’s easier to read the labels):

I encourage you to play with the graphs. You can download the Tulip perspective from here. The graphs of most interest to you are composite (all fora together) and intersection (all fora together, but only the nodes that are shared across two or more fora). For both I have already computed all the subgraphs at different levels of k.

Unfortunately, applying a layout algorithm to one graph messes up with the others, but no fear: only at the visualization level. Now the layout is optimized for subgraph k = 003 of intersection. If you want a pretty layout of another graph, a quick way to do it is the following:

  1. Download the script d1_layout.py (you may have to click on the “Raw” button and then “save page”).

  2. select the graph you are interested in, in the graph list on the left.

  3. Open the Python IDE clicking on the Python button on the left.

  4. Click on the Load button in the IDE and select the script you just downloaded.

  5. Select the graph you want to prettify from the top bar in the IDE.

  6. Click on the “Play” icon in the IDE (bottom right) to run the layout script on your graph.

You can also just play with Tulip’s layout algorithms. You will find them on the top left of your Tulip window.

Have fun! @amelia @Wolha @Jan @Richard @Jirka_Kocian



Molto grazie! I’ll play with this over the holidays.

Buon Natale!

1 Like

Thank you very much @alberto for this. It looks quite useful and analytically interesting. However, as my memory is rather auditive (I have a quite hard time learning things only by reading) and my understanding of mathematical and informatics logic is quite tenuous, I would very much appreciate an audio-visual tutorial. Should you have some time for that, let me know.

Nice, healthy and productive 2021 everybody @rebelethno!

1 Like

@SZdenek I certainly will make time. Let’s schedule an info session. In fact, if @amelia agrees, I would propose an all-around @rebelethno + @nextgenethno session, because this is an interesting example of getting to a tailored visualization of the semantic network.

BTW: I am also not a visual thinker. I approach networks more mathematically. But you and I, @SZdenek, appear to be exceptions here. :slight_smile:

Ideal! If @rebelethno promise to fill out a doodle :wink:


I will!

I made one here, with the 1h30 min timeslot that I know fits in our time zone restrictions. If we need more time, we can try to extend 30 min or so one way or another (let me know @alberto!) – those of us who can stay usually just hang out if we run over or arrange to show up a little early, anyway.

I am in (not just metaphorically, filled in the poll:)

1 Like


oh hi @rebelethno

I did the Doodle. Mostly available. J

1 Like

Tuesday 19th works for most of us. :+1:

Tuesday Jan 19th, 3-4:30 CET it is! Please save to your diaries.

@amelia MANY thanks for today! I am really happy to see you “in the working mode,” sharp and so on the ball, as always. Just to make sure: 3:00 pm CET, that is 9:00 am for me. (That is we are not talking about GMT).

1 Like

It was great to see you again too, @Jan :slight_smile: Yes, 3 CET, so 9am eastern.

Do we need a Zoom room?

If so, we can use this: https://ucl.zoom.us/j/4149496930

I see a Google Meet link. @amelia?

Hi, @rebelethno .
I’ve downloaded Tulip 5.5 and I cannot run it.
It keeps telling me that it misses certain files: libintl-8.dll, libgcc_s_seh.dll, libstdc+±6.dll, Qt5Core.dll

I can’t open it either. I get a message saying the developer cannot be verified.

1 Like