📗 Graphryder 2.0 Manual

Content

1. Overview

2. Installation

3. Usage


1. Overview

Graphryder 2.0 builds and displays network graphs that describe conversations on Discourse platforms annotated with OpenEthnographer. This manual documents how to set up the Graphryder stack from scratch and how to use Graphryder

You can access our current Graphryder 2.0 installation at http://graphryder.edgeryders.eu/ and, in case that site is down temporarily, also at http://graphryder.edgeryders.eu:9091/ as a backup option.

Note that Graphryder 2.0, as described in this manual, is a reimplementation of Graphryder 1.0, which we now consider legacy software. There is still the Graphryder 1.0 Manual.

Software architecture

  • Graphryder import script exports data from a Discourse & OpenEthnographer postgresql database and builds a Neo4j graph database from that data.
  • Graphryder Neo4j database is the data layer of Graphryder.
  • Graphryder GraphQL API and dashboard are deployed together in a Docker container and are how the end user interacts with the network graphs.

2. Installation

So far, Graphryder has only been tested on Ubuntu 20.04. Its prerequisites include the Neo4J database, Python, and an import script to fill the Neo4J database with content from Discourse forums (or other sources).

2.1. Installing Neo4J and APOC

First we need to install Neo4J 4.3+. See the official installation instructions.

After the basic installation, enable and start the Neo4J system service:

$ sudo systemctl enable neo4j
$ sudo systemctl start neo4j
$ sudo systemctl status neo4j

Now run cypher-shell and set a password for the Neo4J database:

$ cypher-shell

You must also install the appropriate version of APOC for your Neo4J instance. See the official APOC instructions.

After installing APOC, you need to add these two lines to the APOC config file /etc/neo4j/apoc.conf:

apoc.import.file.enabled=true
apoc.import.file.use_neo4j_config=false

Now restart the Neo4J service:

$ sudo systemctl restart neo4j.service

2.2. Installing Python and Python libs

Graphryder 2.0 needs Python 3.8.6+. In case you need also other versions of Python on your system, you may consider managing them with pyenv.

There are also two required Python libraries you have to install:

$ pip install "neo4j~=5.0" psycopg2-binary

neo4j~=5.0 is equivalent to neo4j>=5.0,<6.0 (see). Version 5.0 or newer is needed because otherwise you get this error: AttributeError: 'Session' object has no attribute 'execute_write' because the Neo4j driver object’s interface was expanded in newer versions (see).

2.3. Installing the Graphryder import script

First the basic installation:

$ git clone https://github.com/edgeryders/graphryder-import-script.git
$ cd graphryder-import-script
$ cp config.example.json config.json

Now edit config.json with the correct authentication information for your Neo4J database and the Discourse PostgreSQL databases you will be importing from.

You can import from multiple Discourse instances at once by adding multiple database configurations to the databases array. NB: Due to a bug, one more change needs to be done when adding a new database configuration beyond the first one. Open src/graphryder_import.py and edit the default value of the databases array to have as many empty hash values inside as you have database configurations. So for three database configurations, the relevant line should look like this:

config = {'databases': [{},{},{}]}

2.4. Importing conversations

You should now be ready to build your Neo4J database for Graphryder, by letting graphryder-import-script import conversation content. It will do the following:

  1. Get all users, groups, topics, posts, categories, tags, annotator codes and annotations from Discourse.
  2. Redact everything that is not public.
  3. Dump this cleaned data into batched JSON files that are optimized for fast loading into Neo4j.
  4. Import the data into Neo4j and creates the graph relationships.

Run the import script like this:

$ cd /home/webmaster/graphryder-import-script/src
$ python graphryder_import.py

On the main server, there is a crontab job that runs the following script every night:

#!/bin/bash
. /home/webmaster/.bashrc
export PATH=/home/webmaster/.pyenv/shims:~/.pyenv/bin:"$PATH"
pyenv local 3.9.1
cd /home/webmaster/graphryder-import-script/src
python graphryder_import.py

On the main server, this script is available under /home/webmaster/scripts/ryderex-reload-data.sh and the crontab entry can be viewed and edited using crontab -e as user webmaster.

2.5. Installing Graphryder

To install Graphryder itself (that is, its API and dashboard), follow the instructions in the Graphryder repository.

2.6. Configuring port and domain

By default, Graphryder will be served on port 80 of all network interfaces of your server, that is, provide a default webserver (code source). If you want that, just make sure that you do not have any other webserver enabled on that port. Otherwise the Graphryder service will fail to start after a server reboot, with the error message “listen tcp4 0.0.0.0:80: bind: address already in use”. Of course, with no other webserver available on the server, you cannot serve any additional website.

So if you want to also serve other websites from the same server, you have to move Graphryder to a different port on the host system, and run a regular webserver on port 80 that will reverse proxy to that Graphryder port, plus serve any other website you may have. The reason we cannot rely on the docker-proxy process that does the port mapping from inside Docker containers to the host machine and rather have to use Nginx or any other reverse proxy server is that the Docker port mapping operates on OSI Layer 4 and knows nothing about domains, that is, OSI Layer 7 (source).

Proceed as follows:

  1. Edit /opt/Graphryder/docker/.env and change $HOST_HTTP_PORT to the new port (say, 9091 as in our case).

  2. Restart Graphryder: sudo systemctl restart Graphryder.service

  3. Make sure that Graphryder is available under the new port by visiting any hostname that resolves to your server at the new port: http://example.com:9091/

  4. Configure a reverse proxy. If using Nginx, a vhost configuration like this will do the job:

    server {
        listen 80;
        listen [::]:80;
    
        server_name graphryder.edgeryders.eu;
    
        location / {
            # Accessing Graphryder via localhost interface for better performance.
            # It is served on all interfaces though.
            proxy_pass http://localhost:9091/;
    
            # Optional information passed to the proxied application. May be
            # used in future versions of Graphryder 2.0, but not now.
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header Host $host;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    
            # WebSocket support at only the URL used for the socket.
            # See: http://nginx.org/en/docs/http/websocket.html
            # See: https://stackoverflow.com/a/74724776
            #   Note, nested prefix-type locations must mention the full path. Due
            # to the nesting, this directive inherits the proxy_set_header "value type"
            # directives from the parent context. See https://stackoverflow.com/a/32126596
            location /sockjs-node {
                proxy_pass http://localhost:9091/sockjs-node;
                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "upgrade";
                proxy_read_timeout 86400;
            }
        }
    }
    

    And afterwards sudo systemctl reload nginx.service.

    Technical Details

    In the above Nginx reverse proxy config, any choice of location instead of location /sockjs-node will work when proxied to the corresponding same location on the application side. The application does not care at which URL it serves that single websocket per client. However, do not choose / or any path at which http or https content is served. It will still work as the ws:// protocol can serve different content at an otherwise identical URI because the protocol part is different. However, when using /, the “Connection upgrade” directive will then also apply to all your http URIs, and be interpreted as “upgrade to https”, leading to failed requests where this is not possible. It will also make everything slower, and cause the initial layout animation of Graphryder to not be run, probably due to a race conditions.

3. Usage

You can access our current Graphryder 2.0 installation at http://graphryder.edgeryders.eu/ and, in case that site is down temporarily, also at http://graphryder.edgeryders.eu:9091/ as a backup option.

1 Like