@alberto et al. Here is my shot: Following [King et. al. 1994], we evaluate the extent to which each reduction technique: (i) usefully supports inference, understood as an interpretation of the emerging intersubjective picture of the world; (ii) reinforces reproducibility and transparency that help to increase the researcher’s ability to assess equivalence between any two implementations; (iii) does not foreclose the possibility of updating via abductive reasoning (algorithms alone do not decide how parameters should be set to get optimal readability); (iv) combines harmoniously (also as a the network construction technique) with other parts of SSNA.
@jan thanks for this. We are in agreement. I submitted the final version of the abstract, now we wait.
We are delighted to inform you that your submission
“Comparing techniques to reduce networks of ethnographic codes co-occurrence”
has been accepted for an oral presentation (by video) at IC2S2 2021. This decision was based on blind reviews of the abstract that assessed the content and fit with the scope of the conference. Please find the reviews below.
We will get back to you with additional organizational details and recommendations for preparing your video. Video presentations are 12 minutes long and have to be uploaded by July 2. We will release the full program as soon as it becomes available.
This is kind of nice, because quite a bit of the work is done already.
Calling co-authors @Jan, @Richard, @amelia, @melancon, @bpinaud and @brenoust.
Following the acceptance of the extended abstract, we have work to do. We need to prepare a 12-minute video presentation. I am not sure if there are proceedings, and what goes into them: most likely that would be the extended abstract we submitted. I wrote to the conference chair to verify this.
But also, it would probably make sense to write an actual paper. The reviews are quite interesting and encouraging (see below). POPREBEL has another year to go, so we might be able to write, submit and get accepted a whole paper. Writing the paper would also help with the video.
Shall I set up a call to discuss this?
Reviewer 1
SCORE: 1 (weak accept)
This paper presents an interesting way forward in applying network analytic techniques to a corpus of ethnographic material, which is in itself an important and promising avenue of research. In this network, edges are annotations that connect ethnographic stories (snippets) to codes / keywords. This is a two mode network. A one mode project is a network of codes, connected when they both occur in a snippet or story. The paper investigates techniques to reduce this rather dense networks in order to make it ready for interpretation.
The paper seems work in progress and results are not yet presented, which is a pity. Especially the backbone reduction strategy would be of interest.
Coding is only one way of dealing with ethnographic material and a rather crude one, the authors should qualify this. At the same time the approach might be useful for a much broader set of qualitative studies.
The main problem however seems to be that the thick and dense networks in the projected network are a direct result of moving from a two mode to a one mode network. This is the case and a problem in many social networks. Why not move a step back an start working with the original, bipartite / 2 mode data? And, perhaps, do the reduction there instead on the projection.
Also, it is not obvious that this is a social network. Its keywords connected to texts, not social interactions.
Reviewer 2
SCORE: 2 (accept)
The contribution fits the conference well. It has a proper theoretical background and decent level of applicability.
Reviewer 3
SCORE: 1 (weak accept)
The submission discusses how quality of text-based co-occurrence networks can be improved via 4 reduction techniques: (1) dropping edges that occur only once or few times; (2) dropping edges associated with a low number of informants; (3) dropping edges not belonging to high-k k-cores; (4) dropping edges with low neighborhood homophily. While the reduction techniques are interesting to examine, it would be useful to go into more details on how the parameters for the reduction techniques are decided, e.g. what does “few times” mean for technique 1; for technique 2, what is considered “low”?; for technique 3, what is considered a “high-k”, k of 3?; for technique 4, what is the threshold for homophily?
It would also be interesting to discuss more results in terms of how this task specifically helps with processing ethnographic data, in particular. I could see potential applications to text-based data beyond just ethnographic data. The submission did attempt to motivate the problem well, substantiated with some seminal papers in network analysis in general, and semantic networks in particular.
Excellent news!!! Yes, let’s set up all zoom call very soon to discuss this.
Hello all, I am getting down to homework.
@melancon, @bpinaud, @brenoust: you can find the Tulip file and its README on GitHub. So that’s done.
@jan, @Richard: before I start creating the text structure I would like a confirmation that you are comfortable working on Overleaf, which means the LATEX syntax. I know @amelia is. If that is an obstacle, I would rather fall back on Google Docs or Cryptpad.
Mostly for @amelia, here are the meeting notes.
I feel we need to move fast on this, so I just went ahead writing the structure, for now on cryptpad. If you, @jan and @Richard, are OK with Latex I will move everything onto Overleaf fairly trivially. If not, we stay here. Moving over will become non trivial as we work more on it, so a quick decision would be appreciated.
I’ve not used Latex before but it looks quite straightforward.
Hello all,
Ok, got it. Thanks.
The tlpx file contains a large number of self loops and multiple edges. Is that normal?
@amelia told me this was normal for the self loops during MoN if my memory is good but I do not remember why.
Bruno, it is possible indeed, because the same code may be present in more than one annotation on the same post.
However, we want to drop those for proper analysis. Are there any in the stacked graphs? I thought I had dropped them.
Yup, pretty intuitive. But how do you want us to add our text? I do not see a review function. Should we use different font color and whatever is accepted will be turned black?
Jan, what you are looking at is CryptPad, not Overleaf, and the syntax is not Latex. OK, I guess we stay in the cryptpad then.
No, please, none of that! I have added instructions at the top of the document itself.
Thnxs @alberto, data downloaded. I see you already organized the data into three separate graphs (great!). Can you explain in two words what is meant and what differrence there are between the stacked and non-stacked datasets?
@brenoust Do you have a ready made Tulip implementation of Mutual Information (you suggested we use this to compare the reduction schemas which I think is a good idea)?
@melancon that is explained in the README file:
Each “ethno-PROJECTNAME” subgraph contains two subgraphs. “unstacked” is a clone of the pre-stacking graph. It contains many parallel edges, because each co-occurrence is instantiated in an edge. “stacked” is built from the former. It contains no parallel edges. If there are several co-occurrences between the same two codes (because they appear in more than one post), they are represented by a single edge. The property “co-occurrences” stores the number of posts on which the co-occurrence appears.
Thnxs, missed it because the link you included sent directly to the file. I did not search the whole archive.
@bpinaud I looked at my code, and indeed I do not drop self-loops. Shame on me.
The problem is now solved in the script that builds the unstacked graph from live data. The stacking script will still stack every edge it finds, but now no self-loops will be present.
I have now re-exported the Tulip perspective (same link as above) if you want to start afresh. Be aware that these are live data, so the graphs might differ slightly, in the sense that if somebody wrote or coded a new post yesterday, this will be reflected in today’s version of the file, but not in yesterday’s.
In order to insure comparability of results, we should decide of a date/time and freeze a dataset that we use in the context of the IC2C2 paper. Maybe add this file to the repo and name it using the date/time used (?).
A lot actually. Something like 7K for opencare only.
Ok, got it. Thanks.
Well, GitHub is a wiki, so you can always see which version you are using. Anyway, the current version should contain no self-loops and I do not intend to modify it further!