Writing a paper on network reduction (landed on "Applied Network Science")

melancon · October 27, 2021, 4:22pm

Ah! Thanks Bruno, I forgot to mention this …

alberto · October 28, 2021, 8:49am

Great work!

I doubt it. What you can do is work in reverse: start from a desired “amount of reduction” that will make the network readable, and then choose the values of the parameters to obtain that amount. I used that reasoning here.

Your reduced network is much larger than the ones I computed (I am around 200 nodes, 600 edges). But your reduces network is somehow fairly legible anyway. I think this is the added value of the Simmelian backbone method. I seem to recall, from reading Nick’s thesis, that his method led to highly modular networks, and again what you obtain here is very modular. This mitigates the readability problems associated with high network density – and indeed your reduced network, while very dense (dl = 0.16 using Ghoniem’s measure of density), are still readable, at least to me. What do you think of this, @amelia, @jan and @richard?

Then, an analytical strategy available to the qual researcher is to examine each codes community separately (and they are nice and small, 10-40 codes each approximately). She could even wrap each community into a metanode and look at the resulting network of metanodes – I imagine there would be between 15 and 20 of those. This is a result, congratulations!

Hmm… we need to agree on a way to compare the reduced networks. I was thinking of Jaccard coefficients on the entire list of nodes, but that works best if the reduced networks are approximately the same size, which they now are not. @bpinaud, you were going to compute these statistics… any idea?

bpinaud · October 28, 2021, 10:44am

We also work on Jaccard index and other measures with Guy.
Here is my understanding of what to do:
Data: for each of the three data sets and for each of the four reduction methods, we have a series of graphs (10 to 12) starting from the complete stacked graph to a very reduced readable and legible graph (around 200 nodes and 600 edges).
Aim: for the most reduced interesting graphs (instead of what i did to generate the IC2S2 reduction document on google drive) we can check if the edges are the same/almost the same between the four methods and for each datasets. We can simply compute the ratio of common edges between each graph (easy to do with Tulip). Of course, we have to plot the reduction behavior of nodes/edges for each method and each graph.
My questions: do we need to compare the graphs (ratio of common edges) at each step of the reduction process?
In the end, the Jaccard index cannot be used because the sets to compare must have the same number of elements. We think that the ratio of common edges between two graphs is enough.

alberto · October 28, 2021, 12:38pm

Almost. For more rigor, I would like to set the size of the reduced network in ways that are supported by the netviz literature. But I am super ignorant around that. I tried to rummage into Google Scholar, but was completely unsuccessful. So I am stuck with Ghoniem 2004. His largest graph, that is still low-density enough to be understood by the people interpreting it in his experiment, has 100 nodes and dl = 0.2. Now, dl is dependent on network size, like Guy famously observed. A network with 200 nodes and 800 edges (Guy’s “four edges per node” rule of thumb) has dl = 0.18. This is how I landed on “no more than 4 edges per node, dl < 0.2”.

But this is pretty weak. Do you guys have any pointers to more recent work on how large and dense networks can be for humans to still be able to make sense of them?

In theory, no. Because the reduction process stops at a point that has nothing to do with the ratio of common edges, but only with the visual processability of the reduced network.

Why? You can certainly compute the index (number of elements in (A intersection B)) / (number of elements in (A union B)) for any number of elements in A and B. Of course, it those numbers are different, that number can never be 1. But you could solve this by rescaling for the maximum possible value attainable by the coefficient J.

For example, imagine that set A has 10 elements and set B has 5. The maximum value attainable by J is 0.5, in the case that all 5 elements of B are also in A. In that case, A union B has 10 elements, and A intersection B has 5, so 5/10 = 0.5. Now rescale this to 1, and rescale the actual value of the Jaccard index accordingly. For example, if 3 elements of B were also in A, you would have:

A union B has 12 elements
A intersection B has 3 elements
*J(A,B) = 0.25 => rescaled it becomes 0.5.

Ok, but what is the denominator here?

amelia · October 28, 2021, 2:36pm

Hi all! @Jan, @Richard and I just had our meeting and wanted to ask if you all would be able to give us 2-3 different reduction approaches to the POPREBEL dataset so we could better answer some of your questions based upon the data we know and are familiar with. In the meantime we are getting down to writing!

alberto · October 28, 2021, 2:55pm

Not sure I understand the question… the paper is going to contain four approaches. Two – by association depth and association breadth – are already explained in the Overleaf document. An application to the POPREBEL dataset is here, though complicated by the languages issue.

The other two are: by k-core decomposition, applied to POPREBEL here; and Guy’s Simmelian backbone, applied to POPREBEL data here.

What would you need exactly?

Jan · October 28, 2021, 3:02pm

@alberto - many thanks. We will study this and get back to you if we have questions (most likely we will have - we are working to bridge to qual-quant divide, after all ). For now we are creating a separate google doc to work on our (@amelia and @Richard) contributions to the opening part of the paper. It is here.

bpinaud · October 28, 2021, 3:51pm

alberto:

Why? You can certainly compute the index (number of elements in (A intersection B)) / (number of elements in (A union B)) for any number of elements in A and B . Of course, it those numbers are different, that number can never be 1. But you could solve this by rescaling for the maximum possible value attainable by the coefficient J.

For example, imagine that set A has 10 elements and set B has 5. The maximum value attainable by J is 0.5, in the case that all 5 elements of B are also in A. In that case, A union B has 10 elements, and A intersection B has 5, so 5/10 = 0.5. Now rescale this to 1, and rescale the actual value of the Jaccard index accordingly. For example, if 3 elements of B were also in A, you would have:

A union B has 12 elements

A intersection B has 3 elements

*J(A,B) = 0.25 => rescaled it becomes 0.5.

Obviously yes. However, I am pretty sure, that we can find a better similarity measure between 4 sets of different sizes

alberto · November 10, 2021, 10:30am

Ok, now I am confused. “Stronger” edges means edges with a high association depth d(e). That has no bearing on the degree of nodes. Why do you use average node degree to calibrate k? Also ping @bpinaud.

bpinaud · November 10, 2021, 1:30pm

We used the degree because it is a standard and basic measure (the higher the degree, the higher chance of having the node involved in more triangles). k relates to connectivity (see Guy’s previous post on the definition of k).

alberto · November 10, 2021, 3:07pm

Noted.

alberto · November 11, 2021, 4:50pm

Ok, but

SInce k = 20, this sentence becomes “we will only look for triangles formed using the 20 strongest edges”, which is clearly not the case!

alberto · November 11, 2021, 5:02pm

Ah, OK, I re-read the paper (Simmelian Backbones: Amplifying Hidden Homophily in Facebook Networks), and understood the following. SB is based on a local measure of tie strength.

A pivotal factor in our approach is the following: In correspondence with Burt’s notion of constraint, we do not consider a global, network-level ranking of ties but advocate a local assessment on the level of actor neighborhoods. Let ego be an actor in a network with ordinal tie weights. […] We define ego’s rank-ordered neighborhood as the list of adjacent actors (alters) sorted from strongest to weakest tie between ego and alter.

The parameter k is a threshold. For each node, only its k strongest incident edges are considered. Then,

The degree to which a strong tie is structurally embedded is then defined as the overlap of common neighbors among the top-k entries of [ego and alter’s ranked from strongly to weakly ties]

In this case, if you choose average degree as a proxy for k, what you are saying is That the average somehow represents a “normal” degree, and therefore a normal number of neighbors. I don’t think we can make this claim, though I do not have a better idea.

But, based on the Tulip file computed by @bpinaud, I have a question about the redundancy statistics that you use to reduce according to Simmelian backbone. Nick writes:

We define the Simmelian backbone of a social network by applying a (global) threshold on the derived (local) measure of triadic cohesion. Minimum requirements include, e.g.,

a minimum number of common top-k neighbors, or

a required prefix Jaccard coefficient.

I assume you went for the absolute number of common top-k neighbors, which allows you to reduce by a parameter expressed in integers, correct?

bpinaud · November 12, 2021, 4:24pm

Yes, correct. K is a max threshold. Then, I used the second parameter to create each subgraph by filtering each edges from 0 (the graph called simmelian_something) to k-1 (the hierarchy below the graph called simmelian_something). @melancon will confirm.

alberto · November 12, 2021, 4:46pm

Calling on @Jan @Richard @amelia @bpinaud @melancon: we have cleared a bit of a milestone with this paper, and I think it is time we all meet again.

This is now done. In fact, I think I have done one better: thanks to the progress made by Bruno and Guy, I was able to finish section 5, and write section 6.1, of the paper. This means that all four techniques are now described and their result quantified (section 5). They are also compared against each other in a qualitative way. Section 6.2 is going to quantify the extent to which pairs of techniques converge to the same results; I will draft after Bruno has finished the computation.

Section 6.1 contains also a summary discussion in the form of Table 3 and Figure 5. Figure 5 is what you want (see below). The graphic stuff is coarse, we’ll polish it but I am prioritizing speed so we can have this discussion with you.

I suggest you read sections 5 and 6.1, then we have a discussion and you (Jan, Amelia, Richard) tell us what you see in these reduced networks, and well they satisfy you. I suggest Wednesday 17th (will be on the road on Thur and Fri).

Bruno, Guy, I also finally have some kind of answer to:

Yes. k is a granularity parameter. If you set it to a value k*, that value becomes the maximum possible redundancy that any edge can have. This is because redundancy is the numerosity of the union between the k strongest-tied neighbors of ego, and the k* strongest-tied neighbors of alter. At the very maximum, these can be… k*: complete overlap. So, if you remove all edges with redundancy < k* and your network is still too large, you can recompute redundancy usign a higher value for k. However, there is a catch: as you reduce more, your giant component dissolves into a number of dense connected components, making your reduction quite meaningless. So, in practice, I think it is not that important how you set your k.

Here is POPREBEL reduced:

By association depth > 8

By association breadth > 2

By highest core values > 63 (highest possible for this dataset)

By Simmelian backbone, redundancy > 5:

Jan · November 24, 2021, 5:04pm

Hi @amelia, hope you are well. @Richard and I are going to meet in two hours, at my 14:00 (19 in London) today (Wednesday) to talk about the paper. Will you have a moment? If not, how can we get together to move forward this project? Warmly, Jan

Jan · November 25, 2021, 1:30am

@alberto @amelia and All, Richard and I had a very productive meeting today and now we are writing our thoughts down. Alberto: how many words do we have?

Two main ideas I have: I think that each of the four reduction techniques can be linked to a different school in sociology/anthropology. If we can pull this off, it will be very cool. We are in a dialogue not only with anthropological methods (ethnography) but also anthropological theory. Second, once we draft our ideas, we should run them through you, so you can tell us if it makes sense for you.

Here are the basic intuitions (but we need to study your draft more): (1) Association depth is about revealing the structure of thought or discourse, with its “central” points/nodes. (2) Association breadth is about revealing social structure behind the collective reproduction of culture/discourse, with its central actors. (3) “By the highest scores” is about identifying the focal points of a given cultural field/discourse. (4) Simmelian backbone is about identifying clusters of discourse or sub-cultures or sub-communities of discourse. (1) takes us to Levi-Strauss with his idea of the universal structure of culture. (2) takes to Jeremy Boissevain and his “network anthropology.” Observe that here: https://www.bebr.ufl.edu/sites/default/files/Boissevain%20-%201979%20-%20Network%20Analysis%20A%20Reappraisal.pdf, in footnote 1, he (an anthropologist/ethnographer) talks about giving a paper at a conference on “Mathematical Approaches in Social Network Analysis” in 1977! We are not inventing anything, surprise, surprise, but we are pushing it forward, I have little doubt. (3) is perhaps most clearly about the theory of hegemony, but also about all those theories that focus on the centrality of certain symbols (focal symbols) or metaphors (root metaphors). Finally, (4) is about the fact that cultures are as much unifying as they are about dividing (it seems). I think the Comaroffs are our authors here. Basically, centrifugal and centripetal forces operating simultaneously.

So, at some point we need to talk. It is a lot of fun. BTW, the parts of the paper that we already have are well written and the argument is powerful and clear.

alberto · November 25, 2021, 10:56am

Wooow! That sounds really really interesting, far beyond my expectations.

Again, powerful intuiting work, @Jan and @Richard! I am not sure I understand point 3, but I will at some point.

The paper has a limit of 9,000 words. Overleaf now counts 4,124. We should be a bit conservative because the LaTEX tools for word counting sometimes do not fully take into account figures captions etc., but still there is space for making your argument.

Eager to do so at the earliest occasion!

Do you refer to the parts written by @melancon, @bpinaud and me on Overleaf or to your own work document? Can I have a link to your G Doc?

Richard · November 25, 2021, 2:25pm

Hi Alberto

Here is the link to the Google Doc. Apologies, I now remember that you asked for this link before.
I’ve done a first draft of Jan’s first point (‘Association depth is about revealing the structure of thought or discourse, with its “central” points/nodes.’) I’ll move on to the second point about social structures and actors next.

R

Jan · November 25, 2021, 2:55pm

@alberto, @melancon, and @bpinaud - I referred to the part written by you guys. We (@amelia and @Richard) will send you a link to our working draft very soon - we want to get it into a better shape before you can sink your teeth into it. Last night I studied, sentence by sentence, you part and have several comments and questions. I still cannot fully get the more technical part explaining the operation behind the Simmelian backbone reduction, but I get what it is (only now I absorbed that it is based on Simmel’s concept of primary group!). So, I am reading this: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6785754. We may want to indicate that while the concept originally was invented in the study of human relations and networks (for example, Granovetter’s weak and strong ties), we apply it to concepts and representations, and the structures of connections that emerge among them. In this case we are looking for semantic equivalents of Simmellian triads. After reading the above article (well, some of it) I am more confident that this methods allows us to isolate and visualize “strong” clusters of concepts, equivalents of Simmel’s social triads, i.e. something close to Cooley’s primary groups. But we have to be thoughtful here: inter-human triads are not the same as triads of concepts/representations. There is no “exchange” among them, in the sociological sense, but there is - I think - strong modification/trannsformation (for the lack of better term) of meaning. The image of “Father” in the arguably most famous triad (Father-Son-the Holy Ghost) is different than the meaning of “Father” in a diad “Father-Son.” But that “added” meaning results not just from the more “complex” structure (now 3 elements), but from the semantic field/meaning of “Ghost.” Thus, once we identify those “Simmelian” clusters, we will have a lot of fun doing “old fashioned” interpretation of the meaning modifications/transformations due to the heightened strength of the “triadic” inter-concept ties - a feature that we will discover and visualize!