Graphryder (RyderEx) experiments

This thread is for coordination around the new experimental Graphryder and coordinating with collaborators. To not confuse this with the old codebase, we will call this experimental version “RyderEx” for the time being. In brief, the architecture of RyderEx is as follows:

Import

An import script connects directly to the Postgres database of any number of Discourse platforms with the Edgeryders OpenEthnographer extension and imports all conversational and ethnographic content from those platforms into a single Neo4j graph after scrubbing and redacting any protected, private or sensitive information from the data. This script, written by me, is more or less finished and works as expected. Importing new data and rebuilding the entire graph from scratch takes less than two minutes, and is intended to be done every night once it has been deployed live.

Database

Data from edgeryders.eu, bbu.world and forum.blivande.com are now live on a staging Neo4j database, which can be reached here: neo4j@bolt://graphryder.edgeryders.eu:7496. There is a password that can be given on request, as this database only contains data that is already public on these platforms.

These nodes and primary data relationships that are available in the datamodel:

There are more relationships that are not shown here, but those are redundant relationships that are kept to either make the code less complicated or to make queries faster.

GraphQL API

I have started working on a GraphQL API in the same GitHub repo to access the graph from Neo4j. This is still barely working, and most of the resolvers and connectors are missing. This is what I think we should use to send data to the various clients we will build. GraphQL and Apollo are not technologies I am very used to coding with myself, so I will be looking for support on finishing the GraphQL API more efficiently.

Dashboard with Sigma and Graphology

I would like to start experimenting with building a new dashboard with Sigma and Graphology. My preferred way of working would be to contract someone with expertise in these technologies to build the framework and some functional examples based on our wireframes, and we can then continue according to their template.

Since we don’t know yet exactly how the new Graphryder should look and what it should do, we will be working to create a basic and modular framework that can easily be adapted when we come up with new modules.

Wireframes

These wireframes outline features of the experimental RyderEx, which will be the foundational functional prototype and proof of concept for the new Graphryder.

We are going to build a simple dashboard using three components that share a datamodel.

Component 1: Select platform and corpus

Component 2: Show cooccurrence graph for selection

Component 3: Show user interaction graph for selection

Components share the same data and when one is manipulated that also update the other components. Functionally, we want to achieve the same thing as on the old Graphryder Detangler view. However, in the first iteration, we are not concerned with exploring the content by clicking on nodes and relations. I am confident that this will be reasonably trivial to add later.

We should build the RyderEx prototype in such a way that we could add other components that manipulate the same data in some other way. For example, a third component that dynamically shows a list of posts with the selected subset of ethnographic codes. We might also want to look at other graphs for the same subset of posts - for example a graph that shows the different topics and their connected posts. These things are all possible with the old Graphryder through server side rendering, and one main point of inquiry with RyderEx is to understand how we could do the graph manipulation client side with Graphology.

1 Like

I like it a lot. What do you plan to do about graph manipulation or reduction? (see section 5 of the White Paper)

This is where Graphology comes into play. Basically, we will now do most of the graph manipulation in the browser, instead of on the server.

I just had a long conversation with the guys from OuestWare, and we have decided to work together. Specifically, we are working with Alexis - one of the main contributors to Graphology and the creator of the Sigma library.

The next step is to plan a two-hour session in the next couple of weeks where we make a list of principle features that we want the new Graphryder to have. This will help us develop the prototype more efficiently and shorten the distance between RyderEx and Graphryder 2.

For this meeting, I need @alberto, @amelia, @MariaEuler. We will schedule a time that works for the four of us. It would also be valuable to get input from @matthias, @noemi, @nadia, @marina as well as from more ethnographers from your team @amelia - but to simplify scheduling this will be according to availability at the chosen time.

@alberto, @amelia, @MariaEuler - would you be available from 13:00 to 15:00 on January 19th?

1 Like

Our 10-2 project team calls are on Tuesdays from 13:30 to 15:00, so that won’t work – but any other day that week or next should be fine (or a different time on Tues).

How about 13:00 on Monday the 18th?
@alberto, @MariaEuler?

1 Like

fine with me

1 Like

Cool w me!

1 Like

13:00 on Monday the 18th?

Works for me as well. I’ll join, though probably not for the whole time.

2 Likes

I would very much like that! I should clarify that there are many more people who I would like to have join the session - I just don’t want to take up people’s time if they don’t want to.

I need @amelia and @alberto as they are the main research leads in ResNet, and I want @MariaEuler as an example of an end-user of Graphryder who was not around when it was first conceived and who uses it for community management rather than research.

This wiki-post will act as a collection of the principles for RyderEx.

This list is a work in progress. I’m just adding a few principles now to better explain what sort of insights we need to bring to OuestWare to get the most out of their effort.

Principles for RyderEx

Data model and interoperability

  • All nodes in the browser graph are 1 to 1 representations of nodes in the Neo4j graph.
  • Edges in the browser graphs can either be 1 to 1 representations of Neo4j relationships (like the IS_IN_TOPIC edge between a post and a topic) or calculated (like the COOCURRS between two tags, calculated for a certain subset of posts).
  • Graph perspectives can share the same data model. If multiple perspectives are displayed at the same time, like a user interaction graph and a code cooccurence graph, a change in one perspective must allow an update in another perspective.
  • We want to be able to access the data model from other components in the same web app, for example getting a list of currently selected codes in a cooccurrence graph and the subset of posts associated with those cooccurring codes. With the post ids of those posts, we can then load their content from the endpoint and display summaries directly in the app, for example in a panel below the graph.

Posts

  • Posts are the most important nodes in our data model. All perspectives look at the data in the context of a filtered subset of posts.
  • The set of post nodes are reduced first by which platform they belong to. Since this is so central, this information is available both as a ON_PLATFORM relation between the post and a platform, as well as through a “platform” property on each post node. This is true for all node types, except the “globaluser” nodes, which link the user accounts of different platforms.
  • The second step to reduce the set of posts is by the TAGGED_WITH relation to a “corpus” node. The corpus label is a secondary label of some “tag” nodes.
  • Choosing multiple corpora at the same time should be possible.
  • The software should allow for other reductions of the post set to be implemented in the future, either instead of the reduction by corpus, or in combination with it. Other such reductions could include:
    • Reducing by tag. This is just a more general version of reduction by corpus, using all “tag” labeled nodes instead of just the “corpus” labeled nodes.
    • Reducing by forum category. This means choosing only posts in topics in one or multiple categories. Categories have an IN_CATEGORY relation to topics, and topics have an IN_TOPIC relation to posts.
    • Reducing by users. This could mean selecting only posts created by one or multiple users.

Codes

  • A cooccurence graph should be calculated for a chosen subset of posts, and COOCURRS edges should be generated between code nodes that have been applied through annotations of those posts.
  • Each COOCURRS edge should have a “count” property of how often the codes have coocurred within the subset of posts.
  • As cooccurence graphs can be extremely dense, they need to be reduced to be human-readable. Further, there are a number of meaningful reductions that help us interpret the concurrence network.
  • A code cooccurrence graph should be reducible by language. Each code has a relation to a codename node, and each codename has a relation to a language node. It should be possible to see only the concurrence network for codes that have codenames in one or multiple languages.
  • There are different definitions of coocurence counts, and it should be possible in the future to calculate and choose between different concurrence graphs for the same set of posts. For example though coocurrence counted by the number of unique authors of the posts that have been coded both with code1 and code2.
  • Furthermore, there might be completely different viable ways to reduce the co-occurence network - like “k-core decomposition” or “Simmelian backbone”. While these are not of immediate concern, it would be useful to have an architecture where applying such reductions in the future is not more complicated than writing the right algorithm and including it in the codebase.

Users

Todo: Enable ethnographer focused perspectives, through CREATED and USED relations.

User experience

  • It should be possible to drag and drop nodes around the graph to create manual configurations during exploration of the graph
  • All states must be persistent and reproducible, meaning that a configuration (choice of platform, corpus, filtering, and graph reduction) should be saved in the URL and reachable through a link.
  • To be continued…

Can’t make it. I am involved in the new batch of Horizon proposals, have a consortium meeting at 14.00.

Do you have any windows open that week?

Ah, I stand corrected: the meeting I am in is on Monday 11th.

That said, the week 18-25 is tricky because 26 is the deadline for the GD Horizon calls. There could be disruption.

Is the end of next week better for you? For example Friday next week - 15th?

Ping @alberto?

No, no, if anything it is more at risk. Let’s be in the 18-25 week, please.

Alright, 13:00 on Monday the 18th it is. Let’s hope we can make it.
It might not need to be 2 hours if we do some asynchronous work in this thread first, following the same format as in my post above.