Graphryder 2.0 – Workplan

Yep, I am still interested in this one, and would be happy to start on scoping anytime after this week :slight_smile:

Perfect! Feel free to start anytime, and let’s touch base end of next week?

1 Like

Alright, I’ve had a poke around here, and here’s my rough guess on a timeline for phase 1:

Action items:

  • Pull back the experimental changes around RedisGraph models within Discourse (~2 hrs)

    • While I’m happy with the experimentation to add a REST-like API for querying individual nodes/edges/relationships of the graph, the existing Graphryder queries are a bit more complex than that and would require a more mature system than we really need to build to utilize it properly. In light of that, pulling this bit out and opting for a single endpoint which accepts a string to run as a Cypher query should be the initial push.
  • Sync the Discourse DB so that it writes to RedisGraph as well (~8hrs)

    • Reference the existing ImportFromDiscourse module to determine which Discourse events this needs to occur on to maintain an up-to-date graph in Redis
      • Create / update topic / post
      • Create / update tag
      • Others? Annotations?
  • Write an endpoint to accept Cypher queries to RedisGraph (~4hrs)

    • Ensure this is read-only for the graph; no modification of nodes should be possible
    • Need to think about a security approach for this endpoint; api key provided by the client?
  • Adapt Graphyryder API to call Redisgraph endpoint instead of neo4j (~15-20hrs)

    • Need to get graphryder running locally
    • Should include documentation changes on how the new API functions
    • This phase is the highest risk one at the moment; although replacing the queries should be fairly straightforward, I haven’t been able to get Graphryder running locally just yet, meaning it may be difficult to verify the fixes are working properly and that end-to-end the graphs are unchanged. Hopefully removing the neo4j dependency will help with this part.

This is a short week in New Zealand (happy birthday to the Queen), so I’ll look to get the first three Discourse items together starting next week, then start on the Graphryder API piece the week after. Does this seem like a sensible start to you @hugi?

I’ve set it up a few times now so maybe I can help. What problems are you running into?

Yes, this sounds reasonable! Thanks.

Yes, we need to get annotations, codes and code names from the “annotator-store”. And even though it’s not used in the importer now, we should also include the relations for who has created each annotation and code.

All of that data is already available in ActiveRecord, so getting it shouldn’t be harder than getting the other data.

However, I’m not sure if these end up in the event log in the same way as the standard Discourse stuff, @matthias?

We should use an approach here that syncs with what already exists for the annotator. How do you think this should be implemented @matthias?

The annotator-store gem and our extensions to it store to the PostgreSQL database via a plain ActiveRecord mapping, and that’s it. We don’t implement any Discourse specific interfaces etc., it’s rather so far a standalone Rails engine that does its own thing and relies on Discourse for authentication and a shared database.

Access should be granted with two mechanisms:

  1. Discourse API key for the so-called Admin API, provided by the client application. This is the approach we currently use for all Vue.js applications to access the Discourse API by first doing a SSO login via community.edgeryders.eu and then obtaining the user’s admin API key via this custom API endpoint.

  2. If the user is an authenticated Discourse user and member of group “annotator”, as currently done inside Open Ethnographer. This would be used for the case of implementing Graphryder visualizations as part of the current Open Ethnographer interface. Which is not planned yet, but should be possible. Since it will not be needed in the immediate future, no need to implement it right now. But if there is a way to implement the API endpoint so that this comes out as a side effect, then do that.

We have detailed installation instructions. They reside outside of the Github repositories though … I hope you found them already …

A minor update here:

^^ this is done and pushed, woo! Now we have a plugin which exposes a single endpoint which will execute read-only Cypher queries against the RedisGraph embedded in the Discourse instance.

^^ the bulk of this is complete, and looks to be finished by end of week.

In our first instance of differences between RedisGraph and Neo4j, it appears RedisGraph doesn’t support unique constraints, which Neo4j does. We should be able to dance around this with a bit of ‘upsert’ style code in the plugin, which I have on the docket for tomorrow.

I’ve gotten a (more recent version of) neo4j running locally, which seems to run ok, including the suggested plugins. My trouble with the graphryder API so far has been installing python v3.5 effectively, but will give this another crack tomorrow. (For extra-interested parties, I ran into this and haven’t gotten the suggested commands to work just yet)

2 Likes

Thanks for the update!

I highly recommend going for pyenv to get that working. It’s really the way to go to handle that sort of nonsense with outdated dependencies.

Another update here:

The initial build of this is done; I need to install the annotator-store gem in order to confirm annotations are working as expected, but this looks to be working for Users, Posts, Topics, and Tags, as well as the existing relationships from the existing API. We’ll likely need to iron out some bits around odd post formatting yet, but those should be easily resolved once we expose this to a more robust database of content. There’s an initial importer which can be run by passing an ENV variable on startup, and ongoing updates triggered by any saves to those models.

The basics here are also done; we can send arbitrary queries via the API now. I haven’t secured the endpoint via API token just yet; I’ll do that this week. I also want to find a robust way to ensure that only read-only queries can be executed via the API; no merges / removes etc. The quick-and-dirty way is to enforce that the given query starts with MATCH, but it’s still very possible to bypass that requirement and b0rk the graph if someone wanted to.

Yep I’m using pyenv, but seems like its missing some openssl-related thing; I’ll continue to try to get that going, perhaps booting it up on another machine if need be.

2 Likes

Another update here:

  • I’ve gone through and ripped out the importers and neo4j references in the Graphryder API, and gotten some initial integration requests going through as expected. Here’s the WIP pull request for it (+155 / -1.5k, not bad!).
  • Next piece is going through each of the queries (there’s something on the order of 30 or so?) and ensuring that they work and return data as expected; initial testing suggested I’ll need to go back and make a more robust deserializer to handle all the various types of cypher queries we’re feeding it.
  • Regarding auth, I’ve put together a little thing that should work for us, including allowing annotator group members to access the endpoint, that’ll go in soon once I’ve tested it a bit more.
  • Stripping away the neo4j dependency and doing a minor update to tulip has allowed me to get the graphryder API running locally on python 3.5.3; still need to confirm that moving tulip 4.10 → 5.2 doesn’t break it (which can happen once some more of the redisgraph queries are going through successfully)
1 Like

Wow! I will have a look at this soon! Which queries have you tried so far?
Do you think it would be safe for us to install the graph plugin on one of the live Discourse installs to test with production data?

That sounds about right, yes.

Great, that sounds about right @matthias?

A very good side-effect improvement.

@alberto, heads up, very good progress being made here on a real-time SSNA running directly in Discourse without Neo4j.

1 Like

Let me go in and add some error handling so that if a record import fails it doesn’t crash the instance; once that’s the case I’d be super curious to give the import a go; I’ll ping once that’s the case. :slight_smile:

It’ll also be a good test to ensure that installing the RedisGraph module works properly in production as well.

2 Likes

I am not sure I get everything, but def looking forward to peeking over your shoulder.

Alright, so installing this on an instance at this point would be an interesting endeavor. Here’s the plugin in question:

I’ve put in some error handling so that it has a low risk of crashing anything once it gets set up.

Once the plugin is installed, it will start auto-syncing changes to the DB automatically. In order to do a full import of things already in the database, you can either set the GRAPHRYDER_IMPORT env variable on startup:

GRAPHRYDER_IMPORT=1 rails s

or run the importer manually in console:

> Graphryder::Importer.initialize!

That should output something like this:

If some queries fail, they’ll pop output into the console without killing anything else:

NB that the first time the plugin runs, it will attempt to install the RedisGraph module onto the Discourse redis instance; I haven’t had trouble with this in the past installing on a local instance, but haven’t run it through the launcher rebuild app install process before.

This will expose an endpoint, /graphryder/query, which should be accessible to API keys which either have site admin access, or are members of an annotator group.

I make no guarantees about the robustness of the graph, or the endpoint returning values from the graph just yet, although simple queries like

MATCH (post:post) RETURN post, count(*) as count

seem to be going through okay. It would be possible at this point to b0rk the RedisGraph db via this endpoint if you wanted to, although nothing a re-import using the above commands couldn’t fix.

Here’s an example curl to try out:

curl -X POST http://<instance>/graphryder/query -d '{"query": "MATCH (u:user) RETURN u, count(*) as count"}' -H "Content-Type: application/json" -H "Api-Key: <api_key>"

and the results I’m currently getting on my test instance:

1 Like

I propose that @daniel installs this on the Babel Between Us forum for testing. Sounds good @matthias? We could of course test it on ER main too, I’m just thinking that the data on BBU is less sensitive in case there would still be some security bugs to iron out.

1 Like

Seems good. @matthias agrees that we install it. I will install it today (Thursday) on the BBU installation.

2 Likes

Alright I’ve started syncing this with the Graphryder API, and will track my progress here:

1 Like

Yeah, I suspect that a lot of these will need some work and thought.

For example, come to think of it, since we are now moving to having multiple SSNA datasets in the same database we will need to modify the model slightly.

Since we are still in the “Phase 1” step we are assuming that we will have one Graphryder API install per SSNA dataset. However, many SSNA datasets can exists on the same Discourse install. Usually, an SSNA dataset is defined by one or more Discourse topic tags (notice to avoid name confusion here with the annotator ethnographic tags, which we usually call “codes”). If you look at the importer script of the Graphryder API, it builds the Neo4j database from topics with the tags it it is give in the conf file.

In our case, we will instead need to modify all the Cypher queries so that they only get the data related to a certain Topic tag. Luckily, these sort of relational queries are exactly what Cypher is built for and I have written a lot of them. Once the plugin is installed on BBU, we can have a go at it.

In the meantime, are Topics tags already tracked on RedisGraph?

I’ve just put in a bit of code to sync TopicTags, so hopefully that will set us up to query by tag, perhaps via a separate parameter to the graphryder/query endpoint

In the meantime I’ll continue pressing through and identifying issues with the other endpoints :+1:

1 Like

The Graphryder plugin required a newer Discourse version than what we had installed. So we used the opportunity to upgrade the multisite installation and bring it up to date. :slight_smile: We are now on Discourse 2.4.5.

The plugin is not yet installed as deployments with the Graphryder plugin included still fail ("Redis::CommandError: ERR Error loading the extension. Please check the server logs.“) I’ll look into it later but first have to get some sleep. :sleeping:

1 Like