Graphryder (RyderEx) experiments

hugi · October 20, 2021, 9:03pm

I have now updated RyderEx to only use the intersection operator.

I fixed this, see the same state on our own deployment.

We still need to dig down into this, there might still be some differences.

Good to know for the future, let’s look at possible implementing this down the line.

hugi · October 21, 2021, 9:39pm

This is now fixed, more or less. RyderEx now sees 5,867 annotations in the NGI corpus Still a very slight difference, but I think it might have to do with RyderEx not counting two instances of the same code as two annotations.

I have made a few other fixes:

Node diameter has been reduced to make graphs more legible
Number of columns in tables has been reduced by making titles of topics and usernames into links
Annotation list now includes the code of each annotation, and an indicator of it the code of the annotation is in the scope
Annotation quote now links directly to source post on platform

hugi · October 21, 2021, 11:54pm

I found out that cross-graph filtering is broken, and have asked Paul to help us fix it.

hugi · October 22, 2021, 12:02am

One improvement over the old dashboard is that it becomes a lot easier to understand how the primary data (posts) and secondary data (annotations) are related and how these are represented in the graph.

Screenshot 2021-10-22 at 01.59.36

hugi · October 22, 2021, 12:03am

@matthias - how are we doing with that database connection to the platform?
I have set up the cronjob, so once that connection is made we are ready to use RyderEx on up-to-date data as we go.

alberto · October 22, 2021, 7:24am

Are you sure? Annotations are their own entities. Even when they point to the same code, they have different snippets.

Also:

This link no longer works for me (502 Bad Gateway).

hugi · October 22, 2021, 1:57pm

As I said, all graphs are now available here instead: http://server-2021.edgeryders.eu/

alberto · October 22, 2021, 2:01pm

Just checked POPREBEL.

Data explorer query: 6,658 annotations
RyderEx: 5,771 annotations

Still a big difference.

hugi · October 22, 2021, 2:02pm

Are you explicitly excluding annotations on posts by users from which we do not have consent? I have a feeling this might be the issue.

hugi · October 22, 2021, 2:04pm

Or actually, perhaps it it because of posts in protected categories? That is more likely.
RyderEx only loads public posts.

alberto · October 22, 2021, 2:05pm

No, you are right. The query is here: https://edgeryders.eu/admin/plugins/explorer?id=12.

I have no idea how to add that filter.

alberto · October 22, 2021, 2:06pm

That too.

hugi · October 22, 2021, 2:18pm

I just changed your query to only include posts in public categories, and it now yields 5,781 annotations for POPREBEL. Checking the consent is not obvious as that information does not seem to be accessible through the DataExplorer, but I can imagine that one or two posts in the corpus might be from unconsenting users, given the problems we had with the consent funnel last year.

hugi · October 22, 2021, 3:19pm

We are using RyderEx to fulfill the NGI deliverable D2.6. Interactive Graphryder Dashboard II. I’ve included the following in the report:

Interactions on the Edgeryders.eu platform are annotated in deliverable 2.2 “Coding ontology I” and these annotations are displayed through the dashboard.

image1600×920 523 KB

Annotations are linked to ethnographic codes, and the dashboard allows exploration of how often these codes co-occur with each other.

image1600×1109 447 KB

Interactions between participants in the conversation can also be displayed in a graph in tandem with the co-occurrence network. Through exploring the interactions and co-occurring concepts a researcher can discover novel themes and trends. She can then read the original content by selecting it in the content list after filtering through the graph.

image1600×1109 447 KB

A researcher may also use the dashboard to investigate if a hunch is supported by the ethnographic data. In this case, she might have read a post on Edgeryders.eu that seemed to indicate that contact tracing apps are associated with distrust and surveillance. Through the co-occurrence graph she finds out that these codes occur together often in the conversation, and she can support her claim that this is a significant theme in the conversation.

image1600×1111 210 KB

alberto · October 22, 2021, 3:32pm

Good work. But of course researchers want access to the material in protected categories, which creates a more complicated issue of data protection and confidentiality. This does not affect the deliverable, but it does affect the extent to which the ethnographers will want to use the software. Maybe, with the deliverable past, we can imagine some kind of workaround.

hugi · October 22, 2021, 3:48pm

One workaround would be to modify my script so that it does include annotations for posts in protected categories, but that the script redacts the content of those posts, the quoted snippet of text, and the author. That way, the annotations still contribute to the co-occurrence graph.

It could look like this:

When the link to the post is clicked, it would still take you to the right URL, which you can then read if you have permission.

As I see it, there is really only one major problem with this. Every post need to be associated to a user in the RyderEx graph, and if we want to hide the real association we would have to create dummy users to hold ownership of protected posts. Either we create one dummy user that owns all redacted posts or one dummy user user that owns any protected post. That way, users with protected posts would be represented by two nodes in the participant graph - one with their real username, and one under a pseudonym.

alberto · October 22, 2021, 4:20pm

Maybe there would also be the possibility to pseudonymize all users. This is what we do for the public-facinng datasets we save in Zenodo.

hugi · October 22, 2021, 5:13pm

True, but that would make the dashboard a lot less useful for community managers and outreach efforts.
I’m going to try the thing I proposed above, I think it will not be too hard to modify my script to do that.

hugi · October 22, 2021, 5:56pm

Isn’t this a little bit of a play for the galleries anyway? Seeing that the only thing you need to to find out who wrote a post is to enter a string from any post into google, or the Edgeryders search function itself?

Even if you pseudonymize all users and redact content and titles of protected posts, it will still be very easy to figure out which user has written content in protected categories that has been annotated with “criticism”, “police” and “corruption” by doing the following:

Find the redacted post that has been annotated with “criticism”, “police” and “corruption”.
Find the pseudonymized user who has authored that post.
Find other posts by that user that are not protected.
Search for strings from those posts on Google or Edgeryders search to find the real user account.

alberto · October 24, 2021, 12:38pm

In a sense, yes. But it seems you are thinking about security, whereas we were thinking about consent. Of course Edgeryders can never guarantee full anonymity, we say it right at the front door. The use case goes like this:

I agree to participate in research, write, my posts get coded and exported, etc.
Five years later, I decide to delete my user and content from Edgeryders.
The content disappears from Edgeryders, but can still be found on Zenodo, but attributed to @anon12345.

Marco felt it was a reasonable compromise.

On another note, you once told me that the social network is induced like this:

Post A is a reply to post B => link from user a (author of A) to b (author of B)
Post A contains a quote of post C => link from a to c
Post A is just a reply to topic ? link from a to the author of the first post in the topic.

What happens if post A is a reply to post B and it contains a quote from post C? Or if someone quotes posts of several different authors?