Masters of Network 4: Networks of Care

rossellab · February 17, 2016, 11:47am

Question about Detangler

I was having a look at Detangler and I don’t understand what the coordinates x and y stand for. I can see scatter plots and bar plots but I don’t know what they tell about the network. Can someone help me out?

melancon · February 17, 2016, 1:17pm

It’s all in the interaction

Hi @RossellaB, good to see you are playing with Detangler.

First thing you need to know is that nodes on the left panel (substrates) are the main focus. Those substrates relate to one another through nodes on the right panel (catalysts). Catalysts are the “reasons” why substrates relate to one another. In the demo example, people get connected because they co-participate to political lodges (you may have recognized names from the so-called Paul Revere night ride from the American revolution). The quest is to try to figure out, for instance, who was in a position to reach all of those guys pretty quickly (in order to organize a mutiny before the British authority could counterfeit them).

The x, y position of nodes is decided in the following way: nodes on the right panel are displayed using a force-directed layout (ask me if you have no idea what that is). There is no absolute meaning in the x or y value, nodes are just positioned so as to have a readable display. Nodes on the left panel are positionned according to how they relate to nodes in the right panel. The layout attemps at mimicking the layout on the left, substrates are positionned “around” the catalysts to which they correspond (although catalysts are not embedded in the panel. The reason is to make the selection more natural: when you select substrates at the top in the left panel, you may expect the corresponding catalysts to be located at the top in the right panel.

The main feature is the easy selection of substrates or catalyst using the lasso.

We’ll be using Detangler with substrates=people and catalysts=topics, for instance.

Enjoy!

rossellab · February 21, 2016, 9:05am

thanks!

Thank you for the explanation Guy, now it starts to make sense. I know more or less what a force directed layout is, although I’m not familiar with the maths behind it.

alberto · February 17, 2016, 7:46pm

MoN4 stage-setting conference call!

@melancon, are you free for an hour on Friday, say 15 to 16? I would like to touch base with you on the finishing touches to MoN4.

moe · February 17, 2016, 11:34pm

I’m in

Hi everybody,

I’m sorry for the long silence but it’s been a long any busy period for me.

I just wanted to confirm I’ll be attending MoN4 with @dora (we have accomodation sorted).

We also made some progress with python, relatively to my last updates on ER, but not recently. I’m planning to get back to the code next weekend and I’m confident I’ll be able to give you a better update, then.

I can’t wait to meet you in person

Cheers,

s t e

melancon · February 18, 2016, 6:22am

Friday 3pm – ok

3pm - 4pm and more if necessary.

I guess you had a look at the (tentative) agenda, and also saw I wish to give MoN4 a participatory design workshop twist.

Do we open the call to all, or keep it between facilitators (@Hazem?), or us two?

alberto · February 18, 2016, 4:36pm

Open

… so, on Google Hangout, because you can join them with just the link. No Skype.

melancon · February 18, 2016, 2:36pm

How do I extract the STF posts

Is there a tag or something I can grab so I know a post is relevant to STF.

I also need to be helped on the ethno posts, which for now I cannot really exploit.

We’ll talk about all this tomorrow I guess.

moe · February 19, 2016, 12:01am

A bit confused

I finally took some time to read thorugh the most recent program and comments and I am a bit confused…

We started having a plan, with @dora, about what could be done with the Wikipedia data we managed to mine but I see there’s no mention of Wikipedia at all, in this page, so I was wondering whether you’re giving up on that end or it was just left aside for the moment, or… ?

Again, I’ll be catching up with the code stuff this weekend. In the meanwhile, I’d ask: are you planing to have just 2 hours of prototyping for the proof of concept? Ain’t it a bit too shrinked?

I see that properly structuring ideas is the most relevant aspect of the hackathon, but I fear that not having enough time to make them into proper, working pieces of code might risk to end up producing mainly fluff… I hope I won’t sound harsh in saying this, I would just hear what’s your take

alberto · February 19, 2016, 2:09pm

But, hackathon

@MoE, to what @melancon writes I would like to add two things.

First, this is a hackathon, and that means we enjoy a lot of freedom. If you have done preparatory work on Wikipedia data, you are more than welcome to lead a track on Wikipedia data! We’ll treat it as we treat the other tracks – in fact, I might drop my quality challenge and join it myself. We also reserve the right to keep hacking into Sunday – I’ll definitely do it if we really get going.

Second, the time limitations will be mitigated by several factors. The first one is good preparation – join the MoN4 call right now to find out more. The second one is the usual trick of all hackathons: we just stay in touch (through GitHub and other channels) and finish our work in remote. The third one is the LOTE5 freedom that I mentioned above. We should be OK.

moe · February 20, 2016, 2:45pm

Sounds Good

Hi @Alberto (I have often problems with mentions’ hints not appearing, and therefore such mentions not being recognised; is it a known issue or is it just me?)

What you say makes sense and is reassuring. I’m trying to re-organize the several proofs of concept which we put in place with @dora, to have a unified simple tool that we might use to query wikipedia and store responses to a database, for later visualization.

Based on what we managed to fetch via the API, here’s what I was thinking:

we have a list of the medicine pages
for each page, we can query the pageviews, the links to other wikipedia pages and the translations into other languages
for each link to other wikipedia pages, we can tell whether they’re medicine pages or not; if they are, we can record they’re semantically connected
for each page translation we can repeat the above queries and store the relative data

With the stored info we could try to analyze:

which pages are connected, assuming their sub-network might represent a certain topic
which pages (in absolute and relatively to a certain sub-network) have most views
for each topic or sub-network, what is the weight of a specific language in the overall page count

Things we might learn:

higher page counts of certain topics might represent higher interest and/or practice of autodiagnosis for such topics (I’m aware that what’s actually relevant is how to tell between the two but we can investigate this further)
existance of page translations in certain languages might imply a geographical and/or ethnographical relevance for certain areas and/or ethnies
higher page counts for a certain topic, in a certain language, might be relevant too

Thinking of Edgesense and ways to use it (maybe) differently than what it was designed for, we might have nodes representing pages as “semantic knots” (more than “bits of conversation”) and connections representing their semantic affinity. Edgesense could then be used to analyze whether the sub-regions it finds match the ones we found as wikipedia’s internal links (mentioned above).

We could also visualize page counts for each sub-region and each node, both as a global count (including all languages) ans as a “filtered” count, per ethnographic group.

Finally, if we imagine this visualized in three dimensions, we might have:

a planar XY mapping, with all the English medicine pages, where all connections are visualized and semantic sub-regions are highlighted (the distributions of nodes in 2D space wouldn’t necessarily have a geographical meaning)
a Z layering, where at each depth/height we have a “language plane” which shows which of the English pages are translated in a certain language

Connections would exists across layers, giving a “volumetric” representation of medicine semantic networks.

I guess this last bit might sound particularly abstract or confusing, until I manage to sketch a graphic prototype. I hope I’ll be able to do it soon, on paper at least, to try explain the idea a bit better.

If anything of what I wrote makes sense to any of you, let me know

alberto · February 20, 2016, 4:12pm

These are two hackathons, not one!

I think I understand. You want to do two things.

A multiplex network of pages connected by links. The multiplex part takes advantage of wikidata: we know that "influenza" in English is the same thing as "grippe" in French through Wikidata. So, we can follow all the links from "influenza" and all the links from "grippe"; these will induce two networks, one English-speaking and the other one French-speaking". The networks might be different, and we can analyze that difference. In practice, you have a multiplex network in which each language is one layer of the multiplex.
A page count exercise. Page counts have a separate collection method, that we discussed and fiddled around back in 32C3.
If both exercises are successful, you have a multiplex network of medical pages in Wikipedia, each of which is associated to a number in terms of page counts per unit of time. You could then map this information onto the network, even visually:

What I like about the approach is that it is relatively simple to compute correlation coefficients across different versions of (medical) Wikipedia. Correlation betweeen two languages is high if the probability of being connected of two random pages in language A, conditional to those two pages being connected in language B, is close to 1. Notice that you could do this without even looking at page counts! It seems like a lot of work for one day of hacking, but if that’s what you want to do, go for it.

moe · February 22, 2016, 11:17pm

“Multiplex Network” sounds sleek!

I had no idea that was called a multiplex network but I googled it and, yes, it looks exactly like what I was thinking.

When I mentioned a 3D visualization, I had in mind exactly this:

where each coloured layer is a language, each dot is a page, each link between pages sybolizes a semantic affinity. @Alberto we’re on the same page, right?

Regarding @Alberto’s and @melancon’s concerns about time, I agree. The whole process might not be trivial nor quick.

On one end, that’s why I hoped we would have had more than 2 hours of programming; but after what @Alberto replied, I think it is reasonable to think of a simple proof of concept (we choose one question and prove via code how we “could” provide an answer). If that proves to be worth it, I/we can invest some more time the day after, to extend the code and produce something more meaningful.

On the other end, I was trying to produce as much code and db data as I can, to have a good base to start with (and to share with everybody else, obviously).

So far I have most of the code in place to do the data mining. I had to go through a few iterations, as we have limited db storage (free account on mongolab) and I needed to find a way both to do quick grouped queries and to store the results efficiently enough. I think I’m pretty close: I could store around 25K entries in around 6Mb (of 200Mb we are allowed) in a few hours. These entries count 4.5K English pages and all their available translations in any language. This means that within 24h we should be able to populate the db with all the pages, from scratch.

This does not include the page counts, though, which require a separate query and I’m still figuring out if there’s a way to optimize those (ie. not sending a query per page).

I’ll test this later or tomorrow, but I’m confident I can get decent results in reasonable time.

This partially answers @melancon’s question about what I’m storing. In terms of page counts, both because of time and storage restrictions, I was thinking to store sample counts for a given period (ie. 1 month of pagecounts, per day), instead of a tighter sampling.

I thought that, for the sake of demonstration, any timeframe can be used to prove the concept, and we can assume that the real measures will then take place on more accurate/representative data.

Do you think it is acceptable, as an assumption?

Does it still sound too scary? I trust your judgement guys, seriously

melancon · February 23, 2016, 8:46am

Multiplex it is

@MoE @Alberto

Yes, multiplex. I find this a convenient concept, probably more buzzwordy than deep – anything is multiplex if you think about it … it depends on what you are ready to term a layer … We’ll have plenty of time to chew about this.

You got things right. There are several ways to compute similarity between entities described by a “bag of words” (which you can actually see as embedded in a high-dimensional vector space …). The better your index (words associated with entities), the better the similarity measures, from which you usually derive a topology by linking similar enough entities. As for the pagecount, I would expect larger time span to lead to somehow uniform pagecounts over all pages, while finer time spans may indicate when/if pages are simultaneously consulted.

The link structure you consider, the similarity measure you computeIt all depends on what question/task you are supporting.

As I see it, you have done quite a lot of work and will be bringing fantastic material to the workshop. This may well open the door to interesting future collaboration. I mean, people usually get together to finish up what has been done during the workshop, sometimes ending as blog posts, repots, or even scientific publications. In this case, my feeling is we may have things to say to the academic crowd. @Alberto?

alberto · February 20, 2016, 4:18pm

Mentions module

The mentions module started acting up after we upgraded the site to the newest version of Drupal Commons and is currently disabled. See here.

melancon · February 19, 2016, 7:00am

Blame it on me

Hi @MoE,

you are right, we decided in the end not to include anything about Wikipedia/data in the final program,

mainly because we feared there wouldn’t be much to investigate. The motto for MoN is to have clear domain questions together with data, and then mine and visualize the data in ways that can help refine the questions, then iterate until you reach some sort of answers.

We had also included the EdgeRyders conversation data from the beginning, in order to also have a chance to look at this type of data – although what we have for the moment is less concerned with care. It is that type of data OpenCare has planned to deal with (people interacting and discussing issues → socio-semantic network).

Pagecounts did not seem to offer a tangible opportunity to look at how people perform self-diagnosis. So we (@Alberto and I) had to decide not to include them in the final MoN program.

–

Now, regarding your comment on the risk of being “shrinked” by time … well, it is real. Previous MoN sessions expanded over two full days. This time, we had to cope with lots of constraints both on the OpenCare and LOTE5 side. To help with this, I will make data available later today, hopefully with some code snippets, so we can save time and still do some work.

Hope this helps.

moe · February 20, 2016, 2:58pm

Zero Blame

Seriously, I hope I didn’t give the wrong impression. I can only appreciate the organic way topics naturally adapt, here on ER, depending who’s active in the discussions. I wasn’t for a while, so it’s legit that the topic might have faded in favour of others

I was asking, simply because I had spent some time on the Wikipedia thing and I thought something interesting could be observed (or we could at lease try). I tried giving a hint of my thoughts above, as a reply to @Alberto. I’d be happy to hear what you think.

Peace

s t e

alessandro_contini · February 19, 2016, 1:01pm

my skills

Hello!

I’d like to attend the session and I think my skillset will be a good fit for the viz or interpretation team:

paper prototyping / brainstorming
D3js (HTML, CSS)

Looking forward to join and hack

melancon · February 19, 2016, 3:38pm

Paper prototypes, yeah!

Hi @alessandro-contini,

please join, I’ll be more than happy to have you on board, I am most sure you can contribute with great design ideas and/or improvements, and even paper mock-up!

d3 is so great, would be nice to have some of our stuff put up on the web too.

See you in Brussels next week

Guy

melancon · February 20, 2016, 5:05pm

Let’s do it!

@MoE this is great news. I didn’t dare put that forward for MoN4 as I fear I would not be able to keep up to my promise.

It’s seems you have already done part of the work, and about all of the thinking. Then go ahead, I’ll register to your session

I am quite sure I will learn from your experience. And if I understood correctly, you already have put up a db registering pageoucnts on an hourly basis – all great news.

Looking forward to see you in Brussels next week.