Graphryder (RyderEx) experiments

Hi @hugi, now battle testing the software.

Question here:

Towards the end of the video, you select a subset of codes and bring up the content (a list of posts) that contain those codes. Is the list:

  • … the (set theory operator) union of all the posts that contain the codes, i.e. each post was coded with at least one of the codes in the scope?
  • … the (set theory operator) intersection of all the posts that contain the codes, i.e. each posts was coded with all of the codes in the scope?

Based on the numbers, it must be union. I would like to make a case to either make it intersection, or support both union and intersection.

If Ryderex supported the intersection operator, we would have a workaround for the lack of the ability to click on an edge and see the content that generated this edge. Just:

  • select the two codes connected by the edge you are interested in.
  • put them in the scope
  • bring up the content list. Done! :slight_smile:
  • if you want to go to the raw data (the text), click on the URLs (not yet supported as I am testing).

OK, I discovered a workaround: bring up the content list, then order by “codes in scope”.

But here there seems to be a bug, because I have only two codes in my scope, but some posts have more than two codes in scope. Is this double counting? My state URL is here.

1 Like

And one more thing. How does it update? Right now data explorer finds 5,894 annotations in the NGI corpus, while RyderEx sees 4,111.

1 Like

And yet one more. It seems to me that RyderEx counts multiple occurrences of the same code in the same post only once. This is not necessarily a mistake, but we have had a methodological discussion on that with the POPREBEL ethnographers and agreed to count one occurrence of a code in a post each time that code appears in that post. With long posts and an aggressive coding style (like the one in POPREBEL) this drives the number of co-occurrences way up, because it scales with the square of the number of times the code appears in that post: 3 occurrences of code1 plus 3 occurrences of code2 in the same post mean 3 x 3 = 9 co-occurrences of code1 and code2 just in that post. In POPREBEL, RyderEx reads a maximum of 16 co-occurrences in the codes co-occurrence network (and I have my doubt on that number as well, but it looks to be computed as “one and only one co-occurrence edge per post, no matter how many times the two codes occur in the post”). In my Tulip graph, the maximum number is 371.

Yes, indeed. Switching to intersection is easy, but allowing for both is a lot harder. I can switch it over to intersection, I just need to map out where in the code this is controlled.

Noted, I will fix this too.

I will look into this.

I realize I miscommunicated a bit at the retreat - the demo version doesn’t update, because it’s not running on the live Discourse database, but on a static backup. But it is built to update every night once it’s installed in the right place. We need to install it on the main Edgeryders server for it to update, and I am preparing instructions for how to do this and will consult with @matthias.

Indeed, and up to now I wasn’t aware that this had changed - that’s how the old GR worked.

This is indeed the case.

This is a bit of an issue. Here’s why:

In other projects, ethnographers may have worked under the assumption that if an annotation is used once in a post, using it again has no consequence for the topography of the SSNA graph. Because of this assumption, they may very well have been inconsistent in their annotation methodology - sometimes using the same code many times in the same post, while at other times only using each code once per post even when a concept appears multiple times. Personally, when coding BBU, I was not consistent in my methodology in this regard.

Is the Field Methods paper explicit about how multiple co-occurrences within a single post should be counted?

If it will vary between corpora how to count codes in single posts, it is possible to introduce a config setting which toggles this explicitly for some set of corpora tags, but it introduces some additional complexity. I would have to look into the code more closely.

This complicates things from a UX perspective, at least for the slider - it becomes very hard to use with 300+ ticks. No big deal though, we can always use the input box instead.

No, it is a new problem that manifested itself once a very dedicated group of a half dozen ethnographers went to work on a corpus like God intended, which is happening now in POPREBEL. This will be in the next paper (in preparation, to be submitted to Ethnography), and of course in our neglected White Paper.

Coding styles will vary, though we can and will make the consequences of adopting different styles explicit. The right place to do that is the White Paper.

The paper in preparation emphasizes two ways to count co-occurrences:

b, at least in the POPREBEL corpus, is so strongly correlated with the unique number of posts that it is not worth it to keep track of both measures. I will compute the same measures for OpenCare and NGI, and get back to you.

Bottom line is, if we do support two measures of link strength rather than one, d and b are probably the ones to go by.

1 Like

@hugi, I checked: intuition confirmed. This is OpenCare, with 0.94 correlation:

image

And this is NGI, which is more 0.96.

image

So, yes, d and b are the important measures.

1 Like

RyderEx is down right now while I am fixing some stuff, aim to have it back online tomorrow.

1 Like

I have now updated RyderEx to only use the intersection operator.

I fixed this, see the same state on our own deployment.

We still need to dig down into this, there might still be some differences.

Good to know for the future, let’s look at possible implementing this down the line.

1 Like

This is now fixed, more or less. RyderEx now sees 5,867 annotations in the NGI corpus Still a very slight difference, but I think it might have to do with RyderEx not counting two instances of the same code as two annotations.

I have made a few other fixes:

  • Node diameter has been reduced to make graphs more legible
  • Number of columns in tables has been reduced by making titles of topics and usernames into links
  • Annotation list now includes the code of each annotation, and an indicator of it the code of the annotation is in the scope
  • Annotation quote now links directly to source post on platform

I found out that cross-graph filtering is broken, and have asked Paul to help us fix it.

One improvement over the old dashboard is that it becomes a lot easier to understand how the primary data (posts) and secondary data (annotations) are related and how these are represented in the graph.

Screenshot 2021-10-22 at 01.59.36

@matthias - how are we doing with that database connection to the platform?
I have set up the cronjob, so once that connection is made we are ready to use RyderEx on up-to-date data as we go.

Are you sure? Annotations are their own entities. Even when they point to the same code, they have different snippets.

Also:

This link no longer works for me (502 Bad Gateway).

As I said, all graphs are now available here instead: http://server-2021.edgeryders.eu/

1 Like

Just checked POPREBEL.

Data explorer query: 6,658 annotations
RyderEx: 5,771 annotations

Still a big difference.

Are you explicitly excluding annotations on posts by users from which we do not have consent? I have a feeling this might be the issue.

Or actually, perhaps it it because of posts in protected categories? That is more likely.
RyderEx only loads public posts.

No, you are right. The query is here: https://edgeryders.eu/admin/plugins/explorer?id=12.

I have no idea how to add that filter.

That too.