Rethinking the structure of data for SSNA in network form

alberto · September 5, 2019, 1:45pm

I think you are talking about Tulip. There is a Tulip server somewhere. Is that correct, @melancon ?

alberto · September 5, 2019, 1:47pm

I was not aware of that. Why not build the JSON directly from the API? Why go through the Neo at all?

hugi · September 5, 2019, 3:41pm

Tulip is a part of the API. When you click the buttons to regenerate the graphs in the dashboards, the API running Tulip libraries to generate JSONs describing the graphs.

I was a bit sloppy in my addition. I’ll try to explain it more clearly, including some information you already know for completeness.

Graphryder architecture

Graphryder is a dashboard, a backend API and database. Let’s call them GR-client, GR-API and GR-DB.

GR-client does all its work in the browser on your machine. It’s a static JavaScript app that runs on your end, and loads data from the GR-API which runs on the same server as GR-DB. It’s built completely in JavaScript, CSS, and HTML. It uses a whole range of libraries, which are all front-loaded when you load the client in your browser. GR-client never talks to GR-DB directly.

GR-API is built in Python and does the heavy-lifting to generate data for GR-client. In addition to Python, it has quite a lot of Cypher queries, sent to GR-DB using the python neo4j interface library. It also contains Tulip libraries, used to generate the graphs. GR-API is also responsible for building GR-DB with data from Discourse and OE in the first place. In fact, anyone can trigger a rebuild GR-DB with the latest data from Discourse and OE by simply calling the ‘hardUpdateFromEdgeRydersDiscourse’ API route from their browser.

GR-DB is a Neo4j database with data loaded into it from Discourse and OE. Here is a query result that demonstrates the data structure. I have removed some of the results from the graph for clarity.

Graphryder data

When GR-client is launched it starts by downloading a lot of data, including every single post, every user and every tag in the corpus, as well as some other data. These JSON-files are generated by processing data returned by querying GR-DB on demand every time GR-client is loaded in a browser. These JSON files are not cached on the GR-API server but freshly built every time.

In contrast to the posts, users, and tags, the JSON data containing the graphs is not front-loaded. These are instead built on-demand when the user loads a graph in the client. These are built with Tulip python libraries included in GR-API, using the available algorithms. See here for an example of how this data looks.

There are also Tulip files stored on GR-API, used to generate the JSON graphs. These are not generated on demand, but pre-built. They can be re-built with the buttons in the client settings dashboard. Clicking these buttons has no effect on GR-DB, it only triggers a rebuild of the JSON graph files using the data in GR-DB at the time.

hugi · September 5, 2019, 3:42pm

Not sure what you mean here. Why?

alberto · September 5, 2019, 3:56pm

Because that way there is no GR-API anymore, “just” some canned Cypher queries. Neo produces nice visualizations in the console, as you showed me. If there is a way to pass them onto a web page running sigma.js (seems likely), then you can build graph visualization, explorable via browser, with one fewer layer of code, one fewer thing that can break. But that kind of interaction (two linked views on the database, lasso here, something shows up there) is probably not something that comes out of the box in Neo (which is, after all, a DB, not some kind of fancy app).

alberto · September 5, 2019, 3:58pm

Much gratitude for this. Very clear. I propose to copy-paste it into here: 📗 Graphryder 1.0 Manual (Legacy)

hugi · September 5, 2019, 6:04pm

Ah. GR-API is what builds GR-DB in the first place. You could just run those functions as a script of-course, but then you could just as well keep the API and call the script that way, as is the case now.

It’s not recommended to have a front end client access a database directly. It can cause all kinds of problems. You would need some sort of lightweight middle layer API anyway.

A truly radical approach would be to teach the very few people who actually do SSNA how to write Cypher queries. That way they could experiment a lot more than is possible through Graphryder. It’s not very hard, could be done in a couple of webinars.

alberto · September 5, 2019, 6:05pm

Ok, game over.

Let’s try it!

melancon · September 11, 2019, 9:19am

Nope, there are no Tulip server running anywhere, not really. You can build a (simple) server that answers queries and runs Tulip computations behind the scene. Depending on how sophisticated your software needs to be, you may skip this – we used Tulip as a shortcut to avoid reimplementing layouts, metrics, etc.

melancon · September 11, 2019, 9:22am

Well, you can go in different ways. Remember we have these linked views that are all derived from the same primary data as you like to call it. Neo4J stores that primary data, and secondary data contains only a subset (although you need to include more than the data that is displayed to link things).

melancon · September 11, 2019, 9:24am

I understand Alberto suggests using the standard Neo4J viewer which does not support lasso-ing. (Although I would not allow public access on the Neo4J database … well, it’s up to you guys …)

melancon · September 11, 2019, 9:30am

Hey! Major bug guys – this is Guy, but I realize that the last three posts (four including this one) are posted under Amelia’s identity! Hmm … good luck fixing this – it was great living in Amelia’s skin for a moment

alberto · September 11, 2019, 10:45am

No, not public access. The idea was to have some bottled queries. You press a button, a script launches the query, exports the result in GraphML or whatever, then the export is passed onto something else: maybe just desktop Tulip, or maybe a Javascript dashboard. It was probably a bad idea anyway.

amelia · September 11, 2019, 12:45pm

Ah! I see te authorship of my replies have been changed, but hey I am still connected as Amelia when I get back to the website. Have a look at the top right corner of the screenshot:

hugi · September 11, 2019, 1:02pm

@matthias
This is very odd?

amelia · September 11, 2019, 1:04pm

Yep, it is. Again, this is me as Amelia replying to myself:

hugi · September 11, 2019, 1:11pm

@melancon - Occam’s razor to the rescue. Aren’t you just logged in as Amelia? I remember her borrowing your laptop at the skunkworks to show OpenEthnographer. Log out and log in as you?

matthias · September 11, 2019, 1:14pm

Ha, good memory. I just logged @amelia out from all devices (there’s an admin function for that). That should solve it. Only if @melancon is now able to post as Amelia again, we really have a problem …

melancon · September 11, 2019, 1:18pm

I just noticed my browser screen notified me I had logged out.

So, just to test, I clicked on Log In which brought me back into Amelia’s personality …
Again, the screenshot showing Amelia’s thumbnail picture on the top right.

matthias · September 11, 2019, 1:21pm

Hahaha ok this is crazy.

But I just remembered that I logged @amelia out the wrong way … we modified the Discourse account system so that our logins are via communities.edgeryders.eu. I just logged out Amelia there … now that should solve it