How to mine for the like?

martin · August 30, 2019, 9:18pm

Hello - I’m looking for some advice.

Other than being with edgeryders I’ with Ronin - a platform of independent researchers in the US. It has (some) global reach, ~250 members of very different scientific profiles. Likely, the diversity is the single most valuable resource.
We had (this week, @Ronin) a series of All-hands meetings (using ZOOM) to discuss status, wishes and development options.

To my view, we are not making good use of the diversity that the Ronins present; for example to built peer-to-peer links or to get help.
I was wondering about what a network analysis could deliver (in terms of possible relations) that uses as input data (after processing) the researcher profiles that are published on the platform.
I would assume that, for example, natural language processing should be able to convert the profiles into data that then can be networked to show possible relations of different scholars. I would assume that such a methodology exist somewhere and for we it sounds like something that edgeryders is good at.

To put it differently; if you have a set of texts in which a group of people describe (one by one) who they are, what they are doing and what interests them and you like to identify possible (weak) links between me how would you do the analysis?

Any advice to offer by somebody?

best regards,
Martin

alberto · September 4, 2019, 3:44pm

Associate keywords to researchers. If all you have is free-form text, you can do it with topic analysis, though you will get plenty of statistical artifacts and ghosts unless the dataset is very large. If you have keywords, as academics often do, that’s better.
Induce a social network where nodes are researchers, and edges stand for both researchers writing on the same keyword.

martin · September 4, 2019, 6:26pm

Thank you, @alberto - here my feedback:

I would e inclined to think that the first step in this approach jumps too short (for the given situation) because a) of the overhead to generate the keywords (and get them used) and b) likely being too rigid for the underlying diversity (see profiles).
Alternatively, I would expect a tool that is generating ‘descriptors’ from the profiles may deliver a starting data sett (e.g. a word-cloud created for each profile). Feeding these data into a tool for social network analysis. Any comment?
For the social network analysis my Katerina (Uni Galway, my daughter, working with network analysing tools) suggests: " A way to do this is to use Kumu (https://kumu.io/) together with google sheets. Kumu is a platform to create social networks/stakeholder analysis/ complex system maps. It could be done bottom-up (people fill out the sheet) or top-down (someone takes the information from the website and fills out the sheet). Links would be created by using Kumu’s clustering function. I haven’t used this function yet, but I think it should be possible". Any comment?

alberto · September 4, 2019, 9:30pm

There is a large literature in network science built around network of keywords. Most journals request keywords in a publication submission, and researchers tend to think their keywords through (example).

That’s what the NLP literature calls a “topic”. A topic is a bag of words with high probability of co-occurring. A popular model called Latent Dirichlet Allocation allows to identify topics starting from a simple word frequency count. But, as I said, results tend to be underwhelming, at least for small datasets.

Ok, but you still have the overhead of creating the keywords. Plus, in this case, that of making the links. Not particularly impressed… but maybe I misunderstand the tool.

martin · September 5, 2019, 6:45am

Good Morning & thank you very much, @alberto!

To continue:

The group of people affiliated to Ronin is very diverse (and less conventional than habitual academia); also the resources are limited. The latter feature drives looking for (simple) tools in the public domain (any indication is welcomed). The former feature seems more constraining, in particular when searching for (possible) weak links between individuals, as is the interest.
2)The more evident strong links can be found by scanning the ~250 profiles on the RONIN site (including the keywords given therein). Evidently, one could start from there.
Howevr, I was wondering whether there is a way ‘to mine’ the written descriptions of interests (bigger part of the profiles) for useful indications. I was expecting that natural language processing would offer an option.
Any idea to prompt my thinking?

best regards,
Martin

p.s. my next passage in Brussels is 2-9 October