Grand visions: How do we tie things together?

leobard · October 19, 2018, 7:40pm

@hugi thanks for getting inspired by the ideas about LinkedData.

First, Thom van Kalkeren over at https://gitter.im/linkeddata/chat recommended me to look at a framework that combines ideas from GraphQL and RDF: GitHub - cayleygraph/cayley: An open-source graph database . They use the GraphQL syntax to expose a RDF store: https://github.com/cayleygraph/cayley/blob/master/docs/GraphQL.md

##How I understand GraphQL’s role for integration

I looked at the code in realities/…/api/src/index.js and get the idea how to use GraphQL to get data.
The identification of things in GraphQL is always relative to the API you are using: if you want to know “what is the Name of Dream #123”, you query like {dream (id:"123){name}}.
There is no global ID scheme, I understand this is by design: graphql.github.io has moved
Data from different sources cannot be combined by design. Rather, you query one source after another. To get the GPS Position of Dream #123, you would query another service and go for something like {gpspositions (id:“123”, type:“Dream”){lat lon}}

##How I understand Linked Data for integration
In Linked Open Data it would work differently

The ID of the dream would be its URI: http://dreams.theborderland.se/dreams/349
The core data the dream platform knows about this dream would be expressed as RDF, for example as “creative work” from schema.org. Example in JSON-LD { "@id": "http://dreams.theborderland.se/dreams/349", "@context": "http://schema.org", "@type": "CreativeWork", "author": "Salon Leobard", "image": "http://s3-eu-central-1.amazonaws.com/images.dreams.borderland/images/attachments/000/001/342/large/5.jpg?1525040826", "name": "Hammock Reactor" }
The dreams platform can publish this data in different ways. You would chose the way that is most easy to implement and also to parse the data later. I list some of the ways I think are useful:
a) use the ontola-rdf-serializer for Rails that lets you expose your Ruby Objects as RDF: GitHub - ontola/rdf-serializers: Adds RDF serialization, like n-triples, json-ld or turtle, to Ruby on Rails active model serializers . This builds on top of GitHub - rails-api/active_model_serializers: ActiveModel::Serializer implementation and Rails hooks. It was recommended to me today by Thom van Kalkeren over at https://gitter.im/linkeddata/chat
b) Embed the JSON inside the HTML of the Dreams page.
c) Return the JSON as separate document/URI (example: http://dreams.theborderland.se/dreams/349.json) and refer to this URI from the dreams page using a link from the HEAD, such as <link rel="meta" type="application/ld+json" href="/dreams/349.json"/>.
d) Provide a data-dump of all dreams at http://dreams.theborderland.se/alldata.json . This could point to a service that dumps all data as one big RDF stream (file).
In theory, there are tons of more ways how to serve RDF, but these four are the ones I think are most realistic to use in our scenario. a) may be easy to implement. b) and c) would be typical ways how people publish linked data so that search engines (google, …) can find them. d) is a typical hack that goes a long way.
The realities platform could access the linked data in two ways: on demand or indexed.
On demand: The realities just knows that it can HTTP-get http://dreams.theborderland.se/dreams/349.json and know that it is linked data. Whenever needed, it HTTP-Gets the JSON and renders the parts it needs.
Indexed: Realities gets the http://dreams.theborderland.se/alldata.json regularily (for example, daily, there won’t be many changes on dreams) and stores it into a SPARQL Database.
In the “indexed” case, the fact that globally unique IDs are used comes into play: The SPARQL database can store tons of RDF data in just one big RDF database without any configuration. You just dump one file after another into the database. As all files reference globally unique IDS (URIs) the graph automatically builds itself and fills itself with data.

Adding more applications (with Linked Data)

The interesting part starts, when more applications come into play, then Linked Data has more meaning:

Lets go further: There is a “Google Maps Application”, which may manage the GPS position of the Dream. That would be (in JSON-LD): {"@id": "http://dreams.theborderland.se/dreams/349", "@context": "http://www.w3.org/2003/01/geo/wgs84_pos#", "lat": 55.6154473, "long": 12.1574238}. Again, this data would be available, together with all other positions of all other things you can have on the playa, in a file served from somewhere, lets say http://myplacementapp.com/borderland/placement.json
In the “indexed” case, you would know that all placement data is from there and you would also retrieve this data and store it into your SPARQL Store.
Now you could query your SPARQL with queries such as SELECT ?r ?p ?o WHERE {<http://dreams.theborderland.se/dreams/349> ?p ?o.}. The result of this query would return the data from both the dreams platform and the map - combined. The patterns in this query stand for “?subject ?predicate ?object”. Its comparable to “key value” in Json. The differences are that in linked data, keys and values are always attached to a subject they talk about, identified by a URI; and that the values are called “object” as they can be subjects themselves elsewhere or simple data values.
In Ruby, you could for example use https://github.com/ruby-rdf/sparql-client to read/write data to a SPARQL store
The power of LinkedData is that you use globally unique IDs to identify resources across systems - you can easily combine two systems if they “link to each other’s data” in the first place. We saw this in the example of the “Google Maps Application” that already referenced the Dream by its URI. It was clear that it places things that come from another webpage. The “google maps application” could be used to place anything - not only dreams. You would go into the application and it would ask you “what do you want to place? please enter the URI here”. After entering the URI, such as http://dreams.theborderland.se/dreams/349, the placement app would use the above methods (a,b,c) to find out what this is and then say "ah, ok, I got some JSON-LD, you want to place a CreativeWork with name “Hammock Reactor”.
The same with the data. Each app can use its own data “scheme”, and again the ID of the properties is global. So while the dreams platform would use schema.org to state name, author, image, the placement/maps app would use GEO-Scheme to express latitude and longitude. The RDF store stores the data internally with the keys being the full URIs (i.e. http://schema.org/author, http://www.w3.org/2003/01/geo/wgs84_pos#lat) So even if both apps define a “name”, if they use different schemas, both names would be stored and would be queryable.
This design allows you to connect various systems before you even know what systems they are going to be. And connect various data formats before you know what exactly they are going to be.

… if you consider RDF Storage to index data

Now if you want to build a large index, you would need an RDF database. The SPARQL databases are interesting thingies.

I can recommend Apache JENA/Joseki Server, it is simple and works
I can recommend OpenRDF/rdf4j/Sesame, it is a bit more powerful and also works. There is an interesting high-performance backend (http://marmotta.apache.org/kiwi/).
I also find CayLey looks pretty cool and works on top of SQL/NOSQL databases, which would be a good idea for backup/data safety.
Virtuoso is another store that is complex, powerful, but a beast to get started and to tame.

In all cases: use SPARQL to query.

Comparison

I think the key difference is that RDF is built for heterogeneous systems where multiple apps are going to publish/consume data and new apps can “join” anytime later. With GraphQL you would rather connect two systems for a specific purpose.

That makes the RDF apis slower to realize in the first place as you would have to decide which method you are going to use to publish the data, maybe also taking into account what client libraries you have for RDF. Compared to GraphQL, where you do not have to make so many decisions: you just have one single method how to publish the data - the GraphQL endpoint.

Another difference is combining data:

With GraphQL, you cannot just Dump the JSON from two different services into the same database and combine it, the Keys would clash and the data probably. GraphQL is made to be consumed by the GUI and presented.
With RDF, you could load multiple data sources and then query it as one.

What I think a wise developer for #realities could consider

If we look at the use cases I pointed out in the video, I would go for the following recommendation based on technical aspects:

For clear & easy APIs, use GraphQL. You have mapping frameworks you can plug on top of Ruby to expose your data model as GraphQL. You are good at consuming GraphQL. You will be quick and you will get results that work for you. You already made the decision to use React/GraphQL.
For data from one app included into another app: GraphQL.
To change data from another app: GraphQL.
For apps that talk about data from OTHER apps (such as “this mapping app is needed to place things on a map which are actually created/represented within other apps”), referencing to entities from other Apps using URIs would be a start.
If a big data dump is to be exposed by one app and consumed by another: Exchanging Data between Apps using linked data could be then considered.
For a big unifying read-only database where data from many apps is to be indexed to find out “all we know about this dream”, a SPARQL store and linked data could be interesting. The open question would be how to “crawl” the data and get it into the store, option d) from above would make it easy and this is a way many linked data apps work today. Google crawls both individual pages and sucks RDF out of them and also tries to get dumps. Another question would be when to crawl (regularly or - if we know when data changes - on demand). I could ask some people about that, if needed.

Social / Human aspects

You know GraphQL already. Developer support is better for GraphQL as it is a “trendy” framework, you will find tons of info. To quickly get useful, working stuff done within hours, GraphQL is probably the better choice for you.
LinkedData has been around since 1998 and it is heavily used by the search engines (Google) and for marking up data all over the web. But many of the tutorials you google online are outdated (as it has 20 years of history) and finding the “2018” stuff is trickier. You find a lot more sources about RDF, but its harder to decide what is useful and what not.

(sorry for the late reply, I hope it’s still in time)