Yep, I am still interested in this one, and would be happy to start on scoping anytime after this week
GitHub - gdpelican/discourse-graphryder
Contribute to gdpelican/discourse-graphryder development by creating an account on GitHub.
Yep, I am still interested in this one, and would be happy to start on scoping anytime after this week
Perfect! Feel free to start anytime, and let’s touch base end of next week?
Alright, I’ve had a poke around here, and here’s my rough guess on a timeline for phase 1:
Action items:
Pull back the experimental changes around RedisGraph models within Discourse (~2 hrs)
Sync the Discourse DB so that it writes to RedisGraph as well (~8hrs)
Write an endpoint to accept Cypher queries to RedisGraph (~4hrs)
Adapt Graphyryder API to call Redisgraph endpoint instead of neo4j (~15-20hrs)
This is a short week in New Zealand (happy birthday to the Queen), so I’ll look to get the first three Discourse items together starting next week, then start on the Graphryder API piece the week after. Does this seem like a sensible start to you @hugi?
I’ve set it up a few times now so maybe I can help. What problems are you running into?
Yes, this sounds reasonable! Thanks.
Yes, we need to get annotations, codes and code names from the “annotator-store”. And even though it’s not used in the importer now, we should also include the relations for who has created each annotation and code.
All of that data is already available in ActiveRecord, so getting it shouldn’t be harder than getting the other data.
However, I’m not sure if these end up in the event log in the same way as the standard Discourse stuff, @matthias?
We should use an approach here that syncs with what already exists for the annotator. How do you think this should be implemented @matthias?
The annotator-store gem and our extensions to it store to the PostgreSQL database via a plain ActiveRecord mapping, and that’s it. We don’t implement any Discourse specific interfaces etc., it’s rather so far a standalone Rails engine that does its own thing and relies on Discourse for authentication and a shared database.
Access should be granted with two mechanisms:
Discourse API key for the so-called Admin API, provided by the client application. This is the approach we currently use for all Vue.js applications to access the Discourse API by first doing a SSO login via community.edgeryders.eu and then obtaining the user’s admin API key via this custom API endpoint.
If the user is an authenticated Discourse user and member of group “annotator”, as currently done inside Open Ethnographer. This would be used for the case of implementing Graphryder visualizations as part of the current Open Ethnographer interface. Which is not planned yet, but should be possible. Since it will not be needed in the immediate future, no need to implement it right now. But if there is a way to implement the API endpoint so that this comes out as a side effect, then do that.
I haven’t been able to get Graphryder running locally just yet,
We have detailed installation instructions. They reside outside of the Github repositories though … I hope you found them already …
A minor update here:
Pull back the experimental changes around RedisGraph models within Discourse (~2 hrs)
^^ this is done and pushed, woo! Now we have a plugin which exposes a single endpoint which will execute read-only Cypher queries against the RedisGraph embedded in the Discourse instance.
Sync the Discourse DB so that it writes to RedisGraph as well (~8hrs)
^^ the bulk of this is complete, and looks to be finished by end of week.
In our first instance of differences between RedisGraph and Neo4j, it appears RedisGraph doesn’t support unique constraints, which Neo4j does. We should be able to dance around this with a bit of ‘upsert’ style code in the plugin, which I have on the docket for tomorrow.
- Need to get graphryder running locally
I’ve gotten a (more recent version of) neo4j running locally, which seems to run ok, including the suggested plugins. My trouble with the graphryder API so far has been installing python v3.5 effectively, but will give this another crack tomorrow. (For extra-interested parties, I ran into this and haven’t gotten the suggested commands to work just yet)
Thanks for the update!
My trouble with the graphryder API so far has been installing python v3.5 effectively, but will give this another crack tomorrow.
I highly recommend going for pyenv to get that working. It’s really the way to go to handle that sort of nonsense with outdated dependencies.
Another update here:
Sync the Discourse DB so that it writes to RedisGraph as well (~8hrs)
The initial build of this is done; I need to install the annotator-store gem in order to confirm annotations are working as expected, but this looks to be working for Users, Posts, Topics, and Tags, as well as the existing relationships from the existing API. We’ll likely need to iron out some bits around odd post formatting yet, but those should be easily resolved once we expose this to a more robust database of content. There’s an initial importer which can be run by passing an ENV variable on startup, and ongoing updates triggered by any saves to those models.
Write an endpoint to accept Cypher queries to RedisGraph (~4hrs)
The basics here are also done; we can send arbitrary queries via the API now. I haven’t secured the endpoint via API token just yet; I’ll do that this week. I also want to find a robust way to ensure that only read-only queries can be executed via the API; no merges / removes etc. The quick-and-dirty way is to enforce that the given query starts with MATCH
, but it’s still very possible to bypass that requirement and b0rk the graph if someone wanted to.
I highly recommend going for pyenv
Yep I’m using pyenv, but seems like its missing some openssl-related thing; I’ll continue to try to get that going, perhaps booting it up on another machine if need be.
Another update here:
annotator
group members to access the endpoint, that’ll go in soon once I’ve tested it a bit more.I’ve gone through and ripped out the importers and neo4j references in the Graphryder API, and gotten some initial integration requests going through as expected. Here’s the WIP pull request for it (+155 / -1.5k, not bad!).
Wow! I will have a look at this soon! Which queries have you tried so far?
Do you think it would be safe for us to install the graph plugin on one of the live Discourse installs to test with production data?
Next piece is going through each of the queries (there’s something on the order of 30 or so?) and ensuring that they work and return data as expected
That sounds about right, yes.
Regarding auth, I’ve put together a little thing that should work for us, including allowing
annotator
group members to access the endpoint, that’ll go in soon once I’ve tested it a bit more.
Great, that sounds about right @matthias?
Stripping away the neo4j dependency and doing a minor update to tulip has allowed me to get the graphryder API running locally on python 3.5.3; still need to confirm that moving tulip 4.10 → 5.2 doesn’t break it
A very good side-effect improvement.
@alberto, heads up, very good progress being made here on a real-time SSNA running directly in Discourse without Neo4j.
Do you think it would be safe for us to install the graph plugin on one of the live Discourse installs to test with production data?
gdpelican:
Let me go in and add some error handling so that if a record import fails it doesn’t crash the instance; once that’s the case I’d be super curious to give the import a go; I’ll ping once that’s the case.
It’ll also be a good test to ensure that installing the RedisGraph module works properly in production as well.
I am not sure I get everything, but def looking forward to peeking over your shoulder.
Alright, so installing this on an instance at this point would be an interesting endeavor. Here’s the plugin in question:
Contribute to gdpelican/discourse-graphryder development by creating an account on GitHub.
I’ve put in some error handling so that it has a low risk of crashing anything once it gets set up.
Once the plugin is installed, it will start auto-syncing changes to the DB automatically. In order to do a full import of things already in the database, you can either set the GRAPHRYDER_IMPORT env variable on startup:
GRAPHRYDER_IMPORT=1 rails s
or run the importer manually in console:
> Graphryder::Importer.initialize!
That should output something like this:
If some queries fail, they’ll pop output into the console without killing anything else:
NB that the first time the plugin runs, it will attempt to install the RedisGraph module onto the Discourse redis instance; I haven’t had trouble with this in the past installing on a local instance, but haven’t run it through the launcher rebuild app
install process before.
This will expose an endpoint, /graphryder/query
, which should be accessible to API keys which either have site admin access, or are members of an annotator
group.
I make no guarantees about the robustness of the graph, or the endpoint returning values from the graph just yet, although simple queries like
MATCH (post:post) RETURN post, count(*) as count
seem to be going through okay. It would be possible at this point to b0rk the RedisGraph db via this endpoint if you wanted to, although nothing a re-import using the above commands couldn’t fix.
Here’s an example curl to try out:
curl -X POST http://<instance>/graphryder/query -d '{"query": "MATCH (u:user) RETURN u, count(*) as count"}' -H "Content-Type: application/json" -H "Api-Key: <api_key>"
and the results I’m currently getting on my test instance:
I propose that @daniel installs this on the Babel Between Us forum for testing. Sounds good @matthias? We could of course test it on ER main too, I’m just thinking that the data on BBU is less sensitive in case there would still be some security bugs to iron out.
Seems good. @matthias agrees that we install it. I will install it today (Thursday) on the BBU installation.
Alright I’ve started syncing this with the Graphryder API, and will track my progress here:
Sheet1 Graphryder API Queries Url,Status,Notes Annotation /annotations,? /annotations/,? /annotation/hydrate/,? /annotations/posts,? /annotations/post/,? /annotations/comments,? /annotations/comment/,? /annotations/author/,?
Yeah, I suspect that a lot of these will need some work and thought.
For example, come to think of it, since we are now moving to having multiple SSNA datasets in the same database we will need to modify the model slightly.
Since we are still in the “Phase 1” step we are assuming that we will have one Graphryder API install per SSNA dataset. However, many SSNA datasets can exists on the same Discourse install. Usually, an SSNA dataset is defined by one or more Discourse topic tags (notice to avoid name confusion here with the annotator ethnographic tags, which we usually call “codes”). If you look at the importer script of the Graphryder API, it builds the Neo4j database from topics with the tags it it is give in the conf file.
In our case, we will instead need to modify all the Cypher queries so that they only get the data related to a certain Topic tag. Luckily, these sort of relational queries are exactly what Cypher is built for and I have written a lot of them. Once the plugin is installed on BBU, we can have a go at it.
In the meantime, are Topics tags already tracked on RedisGraph?
I’ve just put in a bit of code to sync TopicTags, so hopefully that will set us up to query by tag, perhaps via a separate parameter to the graphryder/query
endpoint
In the meantime I’ll continue pressing through and identifying issues with the other endpoints
The Graphryder plugin required a newer Discourse version than what we had installed. So we used the opportunity to upgrade the multisite installation and bring it up to date. We are now on Discourse 2.4.5.
The plugin is not yet installed as deployments with the Graphryder plugin included still fail ("Redis::CommandError: ERR Error loading the extension. Please check the server logs.“) I’ll look into it later but first have to get some sleep.