Graphryder (RyderEx) experiments

hugi · October 22, 2021, 3:48pm

One workaround would be to modify my script so that it does include annotations for posts in protected categories, but that the script redacts the content of those posts, the quoted snippet of text, and the author. That way, the annotations still contribute to the co-occurrence graph.

It could look like this:

When the link to the post is clicked, it would still take you to the right URL, which you can then read if you have permission.

As I see it, there is really only one major problem with this. Every post need to be associated to a user in the RyderEx graph, and if we want to hide the real association we would have to create dummy users to hold ownership of protected posts. Either we create one dummy user that owns all redacted posts or one dummy user user that owns any protected post. That way, users with protected posts would be represented by two nodes in the participant graph - one with their real username, and one under a pseudonym.

alberto · October 22, 2021, 4:20pm

Maybe there would also be the possibility to pseudonymize all users. This is what we do for the public-facinng datasets we save in Zenodo.

hugi · October 22, 2021, 5:13pm

True, but that would make the dashboard a lot less useful for community managers and outreach efforts.
I’m going to try the thing I proposed above, I think it will not be too hard to modify my script to do that.

hugi · October 22, 2021, 5:56pm

Isn’t this a little bit of a play for the galleries anyway? Seeing that the only thing you need to to find out who wrote a post is to enter a string from any post into google, or the Edgeryders search function itself?

Even if you pseudonymize all users and redact content and titles of protected posts, it will still be very easy to figure out which user has written content in protected categories that has been annotated with “criticism”, “police” and “corruption” by doing the following:

Find the redacted post that has been annotated with “criticism”, “police” and “corruption”.
Find the pseudonymized user who has authored that post.
Find other posts by that user that are not protected.
Search for strings from those posts on Google or Edgeryders search to find the real user account.

alberto · October 24, 2021, 12:38pm

In a sense, yes. But it seems you are thinking about security, whereas we were thinking about consent. Of course Edgeryders can never guarantee full anonymity, we say it right at the front door. The use case goes like this:

I agree to participate in research, write, my posts get coded and exported, etc.
Five years later, I decide to delete my user and content from Edgeryders.
The content disappears from Edgeryders, but can still be found on Zenodo, but attributed to @anon12345.

Marco felt it was a reasonable compromise.

On another note, you once told me that the social network is induced like this:

Post A is a reply to post B => link from user a (author of A) to b (author of B)
Post A contains a quote of post C => link from a to c
Post A is just a reply to topic ? link from a to the author of the first post in the topic.

What happens if post A is a reply to post B and it contains a quote from post C? Or if someone quotes posts of several different authors?

hugi · October 24, 2021, 1:05pm

Let’s try it!

Testing to quote this post in another topic.

Also quoting this post in the same topic.

hugi · October 24, 2021, 2:20pm

SELECT post_id, quoted_post_id FROM quoted_posts
WHERE post_id = 100476

SELECT post_id, reply_post_id FROM post_replies
WHERE reply_post_id = 100476

Discourse counts quotes within a thread as replies to the quoted posts. This means that a single post can be a reply to multiple posts at once.

RyderEx combines the quotes and replies into a single relationship between participants to count participant interactions. However, I now realize that the interaction counter on that relationship is wrong since it will count two interactions when somebody makes a quote which is also a reply. In the example above, the TALK_OR_QUOTED counter in 9, but it should in fact be 6.

Furthermore, Discourse does not consider a post A in topic X with first post Y a reply to topic Y. At the moment, neither does RyderEx, though this is pretty easy to change.

hugi · October 24, 2021, 4:55pm

This is now fixed. If a post is both a quote and a reply, it now only counts as a single interaction.

hugi · October 24, 2021, 5:03pm

I have now pushed an update that enables pseudonymization, the inclusion of protected content, and redaction of protected content titles and text on a platform level. This means that BBU, Blivande, and Edgeryders.eu can share the same installation and still have different settings.

Since the POPREBEL corpus includes quite a lot of protected content, I think this is a good compromise for now. I personally think that the pseudonymized usernames are a bit silly given how easy it is to circumvent by just clicking a post, but I suppose it gives a little extra layer of protection by obfuscation.

These are the current settings for the platforms:

Edgeryders dashboard
Usernames: Pseudonymized
Protected content: Included, with text, titles, and annotations quotes redacted
Consent policy: Only includes the content of users who have passed research consent funnel

Blivande dashboard
Usernames: Original
Protected content: Included, with text, titles, and annotations quotes redacted
Consent policy: Assumes consent of all registered users

BBU dashboard
Usernames: Original
Protected content: Omited
Consent policy: Assumes consent of all registered users

hugi · October 24, 2021, 6:27pm

@alberto and @amelia

I noticed that the POPREBEL corpus has quite a few codes prefixed with *.
There is a function in the RyderEx import script that allows defining a list of prefixes, and any code with one of those prefixes is omitted along with all of its annotations. This config and list of prefixes is applied on a platform level. Let me know if you want to start using a prefix to omit annotations from the graph.

Adding or removing a prefix is very easy, and rebuilding the database only takes a couple of minutes.

hugi · October 25, 2021, 10:49am

@alberto, I needed to revert this back to using the union operator.
However, your hack works:

See your corresponding state on the new server.

In fact, this is a lot more useful than the old Graphryder click-on-edge method to bring up posts because we can look at more than two codes at once, like bringing up all posts are coded with “privacy”, “personal data”, “making trade-offs” and “Google”.

alberto · October 25, 2021, 11:17am

Shame. What happened?

HYPERGRAPHS!

alberto · October 25, 2021, 1:06pm

As we finish the NGI and the POPREBEL report we will know more about this, and, if we are lucky, perhaps we can even pick a single measure, so it can make sense to pospone this development to late 2022.

However, it would be nice if we could get the single measure of link strength to be consistent with the methodological discussion we had with Jan and the ethnographers. This would have the extra advantage that we can get all of them to use the software in early 2022, and we would learn a lot from that.

Is there any way you could build edges based on a count of the occurrence of the two codes within a single post? If code1 occurs, in the same post, n times, and code2 occurs m times, then the contribution of that post to the weight of the code1 <=> code2 edge would be m x n. This would make RyderEx uniform with the analysis we are doing now (example).

Another question. How does Ouestware document the code? To be usable by someone who is not us, RX needs a good manual, at least. With Edgesense we relied a lot on inline help, and I wrote all the help text myself, inserting it into the appropriate places in the code repo. That was a simpler software, though.

hugi · October 25, 2021, 1:33pm

It turns out the filters are a little more complicated than I thought, and I would have to do more digging to make sure the change gets applied everywhere. It ended up breaking the graph filtering, so adding users to a scope and seeing the code graph specific to only those users no longer worked.

matthias · October 25, 2021, 11:43pm

Database connection is set up now for all three platforms accessed by RyderEx. I have run graphryder-import-psql.py successfully. Runtime is around 150 s, just a few seconds slower than with a local database connection. And wayyy faster than old Graphryder, very nice

I still have to do some security optimization (read-only user, encrypted connection) tomorrow. But connections are already restricted to a single combination of database, user and IP, so it’s ok for now.

P.S.: RyderEx drawing routines incl. zoom are very smooth and fast for a web-based application. Wondering what’s under the hood for drawing? WebGL?

hugi · October 26, 2021, 12:43am

Great, thanks @matthias!

Yes, I think so. In fact, that’s why we had to sacrifice the curved lines in the graphs - WebGL can’t draw them as efficiently.

alberto · November 30, 2021, 12:51pm

These two go to the same link.

The general entry point for ryderex is now

alberto · March 4, 2022, 9:57am

@hugi, the coding work in POPREBEL is nearing its end. I would like to deliver them a graphryder dashboard, ideally with the “right” measure of link strength. We have,as I recall it, already refinanced that work (you + Ouestware) in internal budget reallocation for POPREBEL (cc @marina). What do you think about implementing it in the next month or so?

hugi · March 4, 2022, 10:21am

Alright, I have an idea how we could do it.

What’s the budget @marina?

marina · March 4, 2022, 10:39am

Here: https://edgeryders.eu/t/an-opportunity-extra-activities-in-poprebel-in-2022/16491/16