Draft of Network Analysis Report

brenoust · July 8, 2014, 10:59pm

Edgeryders (ER) Spot The Future (STF) is a community of people and like any community we can have a look at it with the help of Social Network Analysis.

What network are we looking at?

First of all, the main network on which we are focusing is the conversation network. It consists of post and comments that people have published on the ER platform.

How do we create such networks?

The structure of ER is as follows: users publish posts or comments. A comment then traces the flow of a conversation. Therefore the exists a conversational interaction when any user writes a comment, which is an answer either to a post, or to another comment. We can then build a network of users having conversations from their given comments. In SNA words, users will be nodes and there is a link between two nodes when a user has written a comment to another user. Some users may have produced more comments than others and that’s what we can use to set the size of nodes.

This is the network we finally may obtain:

How about STF?

Among these comments, we know that some concern STF, so we can identify them (then the users who are active in STF alongside). If we color the edges in two different ways as follows we can see how the STF the community (in orange) is embedded in the whole ER community (in blue), with rippling conversations that are not in STF but still across STF users.

We can clearly see here how the STF community forms its own group but also, almost organically, takes root in the ER community. More elegantly drawn we get this representation:

Let’s focus on the STF community

So here is the focus on the STF community

Interestingly, it is divided into two group, the one on top only being about arrivals…

Let’s talk figures

This analysis is built on top of 11119 comments and 2415 posts, the final network draws 501 users with 2797 conversational exchanges. 128 users are involved in STF related conversations, with 384 interactions (35 of which are actual replies to themselves), producing altogether 2791 comments, among which 1319 have been produced since, and 910 are STF-related content. That means 1319 comments are pre- or post-STF exchanges. Now, in the STF community 33 participants have only exchanged discussions with one and only one other participant (most probably a community manager).

Going deeper: ethno tagging

At STF, we are very very lucky to have an expert Ethnographer, @(Inga Popovaite)! Inga’s amazing work enables us to go even deeper into the analysis. Through ethnographic tagging, we can detail the content of conversations. It is a bit like if until now we were looking only at the infrastructure of the conversations (like highways) and Inga’s work helps us to identify which information circulates through these highways, in the semantically most dense way with less ambiguous possible terms.

Just to give a glimpse, this ethnographic tagging takes the shape of a tree for which each leaf helps in characterizing a comment. We have a total of 243 unique tags, across 6 relevant categories (+ the ER category).

Ethnographically identified, the conversations we can extract will be the real core of the STF activity. And here is the shape of this core interesting network:

So what are people talking about?

We count in this core activity 95 users, around 255 conversational links, and 718 of the STF comments, of which 22 are self-addressed comments, but surely relevant.

Here is the distribution of the main topics and their occurrence in conversations:

cooperation	77
offline-meeting	47
egypt	30
armenia	30
storytelling	28
georgia	21
stf-approach	20
interest-implementation	20
yerevan	19
tbilisi	18
social-media	17
logistics	17
cairo	17
similar-initiatives	16
govenrmental-institutions	14
maps	13
open-source-software	12
bottom-up	12
fundraising	11
active-participation	11
sharing-experience	11
unmonastery	10
challenges	10

and we can also draw the network of overlapping topics:

In this network, 2 topics (nodes) are connected when they have been the subject of conversation between at least a pair of users.

Future work in preparation is the analysis of the topic overlaps category by category, for detailed observation. This data also concerns the conversations before the event at Tbilisi and new topics may have risen up since. Identifying key events and studying the different networks before and after are also very promising research leads.

But we want to know more, who are our community’s brilliant minds?

Now that we know what conversations are relevant, we may have a look at the brilliant minds behind them, especially: Where are they from? How does the mediation work? And STF concerns mostly Georgia, Egypt, and Armenia, how does it impact the network? Is it more impacting the youth or the elders?

To do so, we had to enrich the users’ information thanks to participation of the whole community. Here is a very quick analysis of the age distribution.

Localization of participants

In this part, we have enriched (mostly with semi-auto and manual annotations) countries because software are very sensitive to the way we write things. In the end, we have for each participant, a country that represent their participation main localization. Now when two users interact together, we can assume that their respective countries interact together as well.

So here is the network of countries conversations, the size representing the number of participant for each country. Without surprise, we see how productive are Egypt, Armenia, and Georgia in STF.

But the beauty of geo-localization is that is can be mapped… on a geo-map! Let’s see the worldwide impact of STF:

or on a flat map, and focused on the East-most part

Now just imagine if we had actual city locations, the representation may be much much richer, we can see for example how, within a specific country, people interact together (!)

Ok, but is STF a melting pot, or a salad bowl?

To answer that question we need to see how and who interact within each country and between each country. We can for example color code each user according to their country, the links can be colored according to each end user’s country, and maybe reveal some local cluster.

So we actually see some local clustering (especially the green one representing Egypt), but not too much either, color coding doesn’t really really work with that many different countries. However, we can combine different view: the STF community links that involves within country (next in RED) or between countries (next in BLUE) links, and the actual figures.

Which at this point confirms the important role of community managers (mostly the biggest nodes) in collaboration with everybody in the network. That may induce more within countries discussions we see in red, that are less central and more transversal. The only thing we can state for sure, is that the more an STF user is productive, the more (s)he discuss with people from different countries.

To get more details, let’s have a look at the figures:

Interestingly we can notice that Romania has an extremely strong influence in the production of inter-countries discussions. The field knowledge of the ER-STF community tells us that @Noemi is the main actor in this production and does an amazing job binding everyone together !

Finally a quick look at the age distribution on the information we have, may state that most of the STF users agreeing to mention their age group are between 21 and 30. (By the way, no answer means the user was not willing to give an answer, and blank means the user has not answer).

The following table states the number of user interactions within an age class, but it does not really help us conclude on this dataset, maybe just stating that for a fairly comparable amount of user between 21-25 and 26-30, the 26-30 class seems to interact even more together.

no answer	0
51 and over	1
46-50	1
41-45	0
36-40	4
31-35	1
26-30	42
21-25	20
20 and under	3
blank	33

Finally…

The perspectives of such a network analysis are very wide. We have presented different approaches that help bring an overview of the community’s activities in a blink of an eye, but we always want to explore deeply this potential. We have open new perspectives on link characterization (ethno tagging), on node characterization (age/country information), and a full knowledge of the community’s behavior will raise through observation of these evolutions over different events in time, and of course at finer granularities!

Some technical extra

So now we can enter the technical details!

All the data have been produced thanks to @Matthias efforts on the ER platform (I’m not sur I’m allowed to list the different views I have used that produced the exploitable JSON files here). The data has mostly been processed using Tulip (tulip.labri.fr) a research software that enables network processing and visualization in Python. I have used in it some geo views, node link diagram views, and mostly python processing. Classical spreadsheet tools such as Ms Excel and Google Spreadsheets are also always useful and handy tools. Further analysis has been developed on support of the tool DataDetangler available online at tulipposy.labri.fr:31497 (contributions welcome !).

Many images extracted can be found here (because of size it’s an outside link): https://github.com/renoust/ER-STF/raw/master/screenshots.zip

Here is the tulip files of the different STF networks (overall data + with countries): stf.initialGraph.tlpx

Here is the tulip file of the ethno tagging hierarchy: EthnoTree.tlpx

Here is a tulip file that stored all the bipartite associations (including groups etc.): stf.bipartiteGraphs.tlpx

Here is a set of DataDetangler files that stores all the multiple associations: datadetangler.zip

A bunch of spreadsheets: STF - Member interactions.xlsx, tags.xlsx

All the code within the tulip files is also stored under an MIT license at: https://github.com/renoust/ER-STF

To know more about network processing and all we can do about it, you can always visit my website (http://www.labri.fr/perso/renoust/)

noemi · July 9, 2014, 10:18am

How to play with the ethno tags?

Benjamin, this is great work and very comprehensive after only 3 months of online conversations… I can only imagine how much richer the data and especially conclusions would be if a longer timeline would allow it.

So while in Tbilisi @Alberto showed me an interactive application of the ethno tags which allows us to see who is talking about what topic and how we discover that some topics are grouped together because they appear concomitantly in the same conversation.

Can you give us instructions as to which links to access or what to install (if anything), to be able to see the interactive view? I can’t open the datadetangler, nor the .tlpx files…

brenoust · July 10, 2014, 12:51pm

Thanks a lot!

Yes one of the next key step may be to analyse a longer period of time, and the processing of time is still an open question in research very interesting to dig in !

The interactive application is DataDetangler, an online application available at http://tulipposy.labri.fr:31497, I should add the hyperlink soon (I’m travelling right now, weak signal). You can load the json files contained in datadetangler.zip (see above, unzip the archive, then you can load the file you want by clicking on « load ») then interact with them : ) I develop this application only for research purpose, so it is very rough (sorry ^^; ) Contributors are always welcomed should we want to make it a full application.

The other software is Tulip that you can dowload at tulip.labri.fr. After installing it, you can load any tlpx files.

Enjoy !

alberto · July 9, 2014, 1:27pm

An attempt at interpretation

Let me see if I understand this right.

Relational structure ("the highways")

The STF is (almost) unique, with a giant component connecting 122 out of 126 participants. This can be interpreted as a sign that there is little or no insularity, and everybody is heard out. Centrality analysis shows that ER moderators (especially @Noemi) played a key role in connecting the network. By the way, who are the 6 people in the smaller component?
The STF exercise is healthily integrated with the broader Edgeryders conversation; the two are not disconnected, and yet the STF network is still clearly visible a more densely connected community within the larger network. We interpret this as healthy because it guarantees diversity (with edgeryders not from Armenia, Georgia and Egypt engaging in STF) while maintaining focus (with most STF-related interaction happening across participants from the three STF countries, or anyway directly interested in STF.
The above point is confirmed by the analysis of the geocoded network. Participants in Armenia, Egypt and Georgia contribute the most content, but there is a healthy international variety of contributions. People in the three STF countries invest about 40% of their interactions in-country, and the remaining 60% interacting with people in different countries (both other STF countries or non-STF countries. About the interaction across STF countries, Egypt is central: we have 12 Georgia-Egypt unique relationships, and 11 Armenia-Egypt ones. Armenia-Georgia unique relationships are not mentioned in the histogram, so I assume they are fewer than 8. Interesting datapoint to add: a distribution of participants by country.

Semantics ("the traffic")

In an ethno-coded conversation network the edges carry semantics. What semantics tells us:

cooperation is the keyword carried by most edges – almost 80 occurrences, that means one in 10 comments was coded "cooperation (actually a bit less, we have to make provisions for posts too). It may just be my own bias, but this resonates with Edgeryders lore: "people at the edge are good cooperators, because their most powerful card is each other". Interestingly, Inga's draft report does not mention "cooperation" as representative of any STF country.
I am not sure how to interpret the network of keywords. Do you mean to say that keywords are connected when they have been mentioned by a pair of users across the whole conversation? So, I write a comment tagged "Cairo" and another one tagged "Egypt". If anyone else, across the whole STF conversation does this, then there is an edge from "Cairo" to "Egypt". Correct? If so, we may want to introduce a more restrictive approach to drawing the network, because over a sufficiently long run or a sufficiently large number of participants all keywords end up connected to each other. I wind the data detangler network much more informative:

Notice the components: the northwest one centered of issues (like gender-sterotypes and methods (like offline-meetings, transparency and protest), centered on cooperation; the northeast one, centered on places – probably indicating across-country comparison; the southeast clique of Georgian projects with Tbilis-makerspace). I conclude from these vicinities that participants are discussing projects in the context of problem-solutions conversations, against a backdrop of international comparisons. This is provisionally validated by casual exploration with Data Detangler:

Selecting the clique of Georgian projects (highlighted in green) allows to identify the participants talking about them on the left, in red. Next, I can check what else these users are talking about (highlighted in the red, on the right. And sure enough, you see issues and methods, and international comparisons.

Works?

brenoust · July 10, 2014, 1:11pm

All in all,

excellent interpretation, that’s exactely what we are looking for when producing network analysis ! Let me try to answer your points :

The smaller component is composed of Betta_83, Abby Margolis, Tiago, Noemi D6, chara.oikonomidou and Freelab. The comments though which they are attached are 5167, 5165, 5102, 2300, 4881. 5103, and 5105

Here is the distribution of users by country:

Egypt	32
Armenia	25
Georgia	24
Germany	6
Italy	5
UK	5
Spain	4
Sweden	4
France	3
Belgium	2
Poland	2
Romania	2
Turkey	2
USA	2
Canada	1
Czech Republic	1
Denmark	1
Estonia	1
Hungary	1
Iraq	1
Ireland	1
Portugal	1

About the network of keywords, well it’s only a subtilty : actually behind is an actual multiplex network of keywords through comments, this means that two users are linked through multiple ties, of which each tie is a keyword. Now we look at how keywords overlap in creating 0-1 interactions across users. So when 2 keywords are connected, it means that at least they have been mentionned by 2 people across their « direct » conversation, meaning all their exchanges.

The drawback of this situation is not really what you said, but if 2 managers discuss often one with another across all different conversations, they’ll have a high chance to end up talking about all the STF topics, and thus by their sole conversations, they’ll form a clique of topics. However, that’s why the exploration is useful, in which we can browse any subgroup of users (for example, without the community managers). Another thing, is to consider a restricted subnetwork, that have taken out these « noise producing » nodes. A note for the community managers : you’re not making noise in the general sense of the word, you’re just too good and too productive so it sort of hides the other users behind your exchanges !

By the way, I may have attached the wrong picture, because it is your data detangler screenshot that seem to correspond to my saying :) oops…

matthias · July 9, 2014, 8:37pm

Larger or Interactive graphs?

These are seriously cool results, Benjamin. Referring to the presentation form at this point, not even caring for the content results you found, yet

Only one request left: In the final version of the report that gets delivered to the client (… which it does, I think?), could you include high-detail renderings of the graphs you produced? Like, for example for the “connected topics” network, one allowing to identify individual connections and names for all nodes if one zooms in far enough. Alternatively, maybe my detail-oriented approach to reading these graphs is “wrong” and network analysis diagrams are rather meant to bring up an impression of the general “shape” of a network? :S

brenoust · July 10, 2014, 12:47pm

Thanks a lot !

About the high res pictures, that’s the best I can do unfortunately. The other thing I may be able to do is take different screenshots that need then to be assembled, but I have no experience in that.

Overall, I guess what you want is a tool to « only » explore the network, which is exactly the principle of the tools providing Node Link Diagrams (the network drawings) such as DataDetangler and Tulip.

About the purpose of network visualizations, it’s a very long topic that could be discussed in a thesis, but in the end I would argue that networks are good objects to both show the structure and details at once, always with the support of interactions.

alberto · July 14, 2014, 5:04pm

Where is the file with the analysis by country?

I mean the Tulip file, @Benjamin Renoust!

brenoust · July 16, 2014, 11:21am

Here it is

Right, I forgot to push it!

Here it is: stf.withCountries.tlpx