Making Graphryder API multi-tenant

Right. So what I mean is this; while we can do much with only Tulip, we will always need some other database to handle getting data that relates to some selected entity in the graph. While we could often get that straight from Discourse, it is often specific to the loaded dataset – like listing the posts and comments of that user in this particular SSN.

I’m not completely sure though why the Tulip graphs are built by first building the Neo4j graph and then building Tulip graphs on command, but I suspect that it is something like this: Because each selection of a subset of nodes actually requires a new tulip graph to be built on demand. I suspect that the graph that describes the social interactions of subset B of SSNA A can not be directly inferred from the Tulip graph of A, right? And since Discourse actually does not contain any primary data on wether two users have interacted, the social graph needs to be constructed from scratch for any given subset of the SSN. It is probably prohibitively resource intensive to do this by calling the Discourse API or even relational database every time you want to calculate a new social graph. By keeping the SSN in a graph database you have already done most of the heavy lifting, as there are good graph database functions to answer the question of which users have interacted with each other.

I need the help of @jason_vallet for a complete and correct answer. Tulip is used as a computing engine (which it is) and not as a storing system (which is it not, i.e., no index in Tulip).

The reason is simple. You are right that Tulip is quite robust and versatile as a graph and subgraph container, but it does not come with a query language. Since you already have identified sub/graphs of interest, you may indeed compute them and store them into a hierarchy to be fetched on demand. You can no longer run dynamic queries on your data.

Also, bear in mind that some graphs you consider are not subgraphs of the original all-data graph, it is obtained through a projection acting on the original data (the tag-tag graph is typical of this).

So Neo4j (or any other graph database) comes useful if you plan to let users form queries from keywords or else, which we thought we would investigate. Things turned out differently.

1 Like

But don’t we still need a database to, for example, do this:

Even though we don’t have a prompt for users to write their own queries, this looks like it still needs a database layer outside of Discourse?

Also this:

Would this be viable without a queryable graph database?

We (I am in this with @jason_vallet) are willing to consider the open source option if you have a economically sustainable solution for it. For now, we don’t, so we follow a path going through the standard paid service approach.

Right. This makes complete sense. I use GR as a network scientist, so I like to keep track of whole topologies. @amelia or others with more of an anthro-ethno background might do it differently.

You do not need to re-query Discourse, no. You can do it by functions. If I were doing this in Python-with-Tulip, I would write a function, that takes as an argument the list of codes and returns the subgraph of interactions between the users whose posts have been coded with any (or all ) of those codes. Those interactions are already there, because all interactions are.

But I have no idea how to serve such a subgraph (that, after calling this function, resides in memory) to sigma :frowning: .

Happy 2020, Guy and Bruno!

1 Like

To sum up, it’s not too clear what direction we want to go in. We can’t have everything at the same time for a very moderate budget :wink: Here’s a realistic option for what could happen within the runtime of our current projects:

Focusing on a graph query interface for Discourse

We don’t necessarily want a graph database, rather Cypher or similar as a query language interface to the contents of the Discourse database. That may indeed involve a graph database used as a cache for performance reasons (like RedisGraph, discovered by Hugi) or not (like with AgensGraph, a Cypher interface to PostgreSQL data).

This generic graph query API for Discourse would be a nice feature by itself, on which others could build various other stuff. But, since that would take a good amount of developer time to build to quality, we’d have to make do with a simpler interface for the next two years:

  • Graphryder Dashboard would need some major overhaul due to software rot (example). In principle it could talk Cypher to Discourse the way it talks Cypher to Neo4J now, but the way interface will be different, requiring work.

  • Something based on the Tulip desktop application maybe? By just extending it for our purposes? In that case, a web application would follow later. The webkit-in-making might already add components for a few aspects of drawing live / interactive graphs, but researchers would use the desktop application.

  • Other ideas …?

I don’t say that we need a graph query language interface to Discourse. For our current use cases, we could just follow what Alberto proposed, and overengineering is not a good idea in general. I mean, companies can die from investing in “nice tech” that never gets used to its potential … happened to one of my earlier startup ideas for sure. However, if people here think that this is a nice long-term investment to make, it is somewhere near the upper limit of what could be done with the budget we have.

Whatever the case, I don’t want to invest any precious lifetime or brain capacity into a software that we plan to abandon later. If that’s the only option, I’ll rather tell our client that we’re happy with the dashboard we have and will not implement “Graphryder Dashboard v2” and not claim money for that. So either we come up with a way to salvage parts of the Graphryder software economically, or to re-implement its functionality in a long-term maintainable manner economically … or if not, then the zero-action plan will be our plan …

Your comment and the re-editing of my answer probably criss-crossed one another. One reason to go towards graph databases is as you mention, to take early enough in the process, the “heavy lifting”. One observation is key here: Discourse does not contain any information on the interactions between users, ans even on the interaction between tags. This is information you need to derive from Discourse content. Now, once you have stored Discourse into graph form, in graph terms the user-user graph or the tag-tag graph is obtained by projecting two mode graphs onto one-mode graphs. This pretty standard operation is time costly so having everything as graphs greatly improves performance.

2 Likes

This part I would prefer not to do. I am very unfamiliar with the dashboard code, and didn’t have on my agenda right not to become familiar enough with it to implement this. It’s using Angular, which I don’t particularly like working with and don’t want to learn as it is quite outdated. If it was up to me, I would simply write a nice and simple splash page on graphryder.edgeryders.eu with links to the different deployed dashboards, rather than bothering with making the dashboard handle different sets of data.

I believe we faced a similar situation in the project we are running (which Alberto mentioned earlier in the thread). The app (which as of today is only a minimum viable product, accessible at demo.intuinet.fr) provides access to different graph databases, seen and played with as distinct databases by the user,

while all these graphs are stored in a single Neo4j databases (we use Neo4j because we already had experience with it and had no time to consider going to another technology). It is true we decided to do so because of Neo4j pricing policy, but at the same time it saves us a bit of trouble. Of course, you need to add properties so you can sort out elements as being part of this or that graph.

I can ask our experts to share with you the details on this implementation strategy. Let me know.

3 Likes

I have no idea about that whatsoever. Generally speaking, I like Taleb’s “read the classics” maxim, and I believe it applies to software too. It works like this: only spend time reading stuff that is at least 300 years old, because it has been selected by evolution so that it is unlikely to disappear. If you know your Seneca or your Plato, you can still pass for a learned person in 2100, whereas investing in, say, Liu Cixin is unlikely to yield the same results.

I know next to nothing about software, but it seems to me that lower level stuff takes longer to go out of fashion. So, a long-term investment makes more sense if it is on lower levels.

That’s not how we make progress! You yourself had to buy two trucks in order to get it right the second time. Maybe we will indeed throw away the code, but what we gain at each step is that we learn how to move forward.

Version 2 should, in my mind, make only one more thing that 1 does not do: allow to toggle the visualization between “children codes” (in only one language) and “parent codes”.

I think this would be a good way to prototype We could make scripts that allow researchers to use Tulip reasonably fast, and then be quite generous in making new ones when they ask “can I see this”? Over time, researchers would sniff out the visualizations that help them the most, and that would inform the next iteration of development.

I do not think this is necessarily bad. If we had analytics on graphryder, I’ll bet you would see that many features are almost never invoked.

2 Likes

In the end, I used another approach and installed multiple Neo4J databases on our one and only Edgeryders server. So we have now, in total:

This setup is now considered “Graphryder Interactive Dashboard v1” (a deliverable, see the plan). There are still things to tidy up internally to finalize this tooling. After that, we’ll have everything Graphryder under a single domain graphryder.edgeryders.eu, using one codebase for Graphryder API and Dashboard on the file system together with multiple configuration files.

Ideally I’ll also add a little command line tool that automates the setup of a new Graphryder website and Neo4J database for a new dataset:

graphryder-cli create --tag ethno-earthos --name earthos

For the short term incl. new projects, that will be good to go as setting up Graphryder is then just one command on the server. We can then focus on a medium to longer term solution …

(Also @alberto and @amelia: your Graphryder installation for POPREBEL is ready, see above. The details view is still broken, but will be fixed in the next days …)

2 Likes

Woooow!!! The command line tool looks great!

Looks awesome, thanks @matthias!

How does it look story-wise to you so far, @alberto? legible/meaningful at a zoomed-out level?

Yes. But you go first: I am on holiday this week. Let’s also put it in the POPREBEL workspace cat.

Sure! But I don’t think it makes sense for me to interpret first, since I did all of the coding (and a prelim writeup on what we have so far). I’m more interested in a perspective outside to begin with, from just looking at the graph. Then I’ll weigh in. Happy vacation!

1 Like

Done. The bug already reported turns out to be a major draw on analysis, as well as two more:

  • the Detanlger view does not work.
  • the hard update link does not work.

I reported all three bugs on Github.

cc @matthias

I’ve found something else that makes this an even more appealing solution. RedisInsight is a Cypher-capable visualisation solution for Redis. @melancon and @bpinaud, have you looked into RedisGraph?

Just a heads up that @gdpelican has been looking into this, and we are making some progress. It’s currently slow and steady as we both have other things on our plate, but as the early experiments seem to be panning out, I’m optimistic that we’ll be able to dig into this in the summer.

1 Like

Hello! Here’s an update with my fiddlings in this direction:
https://www.loom.com/share/cf91c429c5e54ce98baa0feaba2d6752

As mentioned in the video, I’ll have some time to commit to this after next week, so happy to get any more clarity on deliverables / timing etc from here.

2 Likes