Graphryder is the technology that we rely on to be able to do “Semantic Social Network Analysis”. In short, the technique is:
- We annotate the content on Discourse with ethnographic tags. This is done with a gem we include in our custom Discourse fork.
- There are two concepts to understand, “annotation” and “code”. An annotation is an instance of using a code and refers to a specific location in a specific post of a specific topic.
- We use a Python service called Graphryder API to build a graph of all users, topics, posts, codes, and annotations in Neo4j, and to then use that graph database to draw interpretations of that graph with a graph algorithm library called Tulip.
What we want
SSNA we can depend on
Our current stack is getting old, and we keep patching it, but some of the technology might break. We already have problems with new versions of Neo4j, and the old versions have other problems and are already end of life. Additionally, there are libraries and components in the old stack that are no longer maintained. We need to be able to rely on our SSNA stack.
Adding SSNA to a project should not incur extra IT costs
Currently, the installation of Graphryder can only handle looking at a single dataset. We currently use the methodology for four projects and need to have four instances of Graphryder API, four Neo4j databases, and four different deployments of Graphryder Dashboard. This is not sustainable, and to be able to offer SSNA projects at a reasonable price we need to be able to add projects without having to go through deployment.
Our goal should be that adding standard SSNA to a project should come at zero development cost apart from carrying its share of paying to the IT overhead fund, which should cover bug fixing and improvement across projects. Within a project, our research budget should be used for research.
Currently, each Graphryder API instance needs to reload data manually from Discourse in a heavy operation, emptying and rebuilding the Neo4j database every time you want to load new data. We would like to have a graph view of what is happening in our Discourse. Which users are interacting a lot and are there clusters of people who interact with each other? Which ethnographic codes are coming up across projects?
An up-to-date graph can also help community managers and provide reflective feedback to users.
Quickly deploy new experimental views
We want to make it easier for the Edgeryders Research Network to experiment with new ways to visualize and analyze the SSNA and social data from the Edgeryders Discourse.
We want to have components that allow us to embed graphs from our projects on our public-facing websites, showing interesting views of our research using up to date community data.
Challenges with the current setup
Neo4j is crippleware
In January I had a look at making Graphryder API multi-tenant, and came to the conclusion that Neo4j is the real culprit.
Neo4j is a powerful graph database, but the open-source Community Edition is only barely usable in production. Specifically, you need their enterprise edition to enable having more than one graph per install and to enable more than one user with different permissions.
Having something that we could simply tack on to any Discourse deployment as a plugin would be a lot easier.
Graphryder API is a messy codebase
Graphryder API has a lot of dead ends and is a bit hard to understand. This is because it was originally supposed to do more than it does, but was left unfinished. Now that we know what we need it to do, it needs to be restructured.
Graphryder Dashboard is approaching end-of-life
Graphryders Dashboard is an Angular app that uses Grunt and Bower. It’s an old technology that is hard to maintain and difficult to work with. It’s also an oddity in our front end ecosystem, where we usually use Vue.js or React.
Eventually, we want to rebuild the dashboard, probably in Vue.js.
Step by step plan
Phase 1: Replace Neo4j with RedisGraph
Both Neo4j and RedisGraph use Cypher. Theoretically, if a RedisGraph stores the data from Discourse and the annotation data in the same structure as Neo4j, Graphryder API should be able to read from that instead of first having to build the data in Neo4j.
Graphryder API has two major parts:
- importFromDiscourse, which loads the data from Discourse into Neo4j. It does not load all posts, but instead only topics with a tag set in the config.
- graphtulip, a library of python classes that read the Neo4j graph and prepare Tulip graphs based on that data.
In Phase 1, we would:
Replace importFromDiscourse with a Discourse plugin that builds an up-to-date database based on the Discourse data, the codes, and annotations, and exposes a protected endpoint for Cypher queries of that data.
Implement a connector to RedisGraph instead of Neo4j in Graphryder API.
Check the Cypher queries of graphtulip functions as needed to now only call the subgraph related to a certain Discourse topic-tag.
Patch up the Cypher queries as needed in case the Cypher implementations of Neo4j and RedisGraph are not 1 to 1.
At the end of Phase 1, we should still have a functioning Graphryder API that works with Graphryder Dashboard, but the importFromDiscourse module can now be scrapped. We now have all the raw SSNA data available through an endpoint that accepts Cypher queries.
Phase 2: Make Graphryder API multi-tenant
At the end of Phase 1, Graphryder API is still single-tenant. Graphtulip prepares “tlp” data files that are downloaded by the Graphryder Dashboard. Currently, it has no way of keeping track of multiple subgraphs, and each “tag-focus” subgraph needs to be processed independently for display.
In phase 2, the Graphryder API is refactored to be able to handle keeping track of an arbitrary number of subgraphs and their corresponding tlp files, and serve up the right files to the Graphryder Dashboard when passed the right tag.
Phase 3: Rebuild a multi-tenant Graphryder Dashboard with Vue.js
Graphryder Dashboard has served us well, but its core technologies are aging to the point of being very hard to maintain. We would like to rebuild the dashboard in Vue, which would also allow us to abstract the components and display Graphryder SSNA graphs on our other websites.
Furthermore, one deployment of Dashboard per API should be enough, and the tag focus should be possible to set client-side.
Doing the work
I have previously talked to @gdpelican about this, and he has been experimenting with RedisGraph.
To get us started, I would like to offer him to work on Phase 1. We could pull the budget from a few different places and share it between units, but it’s hard to estimate exactly how much we need without letting him get to work on scoping out the project.
What I can offer directly is work for up to 10 hours on scoping out the time needed to complete phase 1. This only means coming back with a plan and time estimate. I can also guarantee that we have the budget for up to at least 80 hours from projects I run, and I will talk to @matthias and @alberto to see if we can pool resources to get Phase 2 and 3 done.
Are you still interested in working on this @gdpelican?