Nice to see more triple store implementations. What’s the industry take up of data stores like this? It always struck me as much more academic and not used as much in industry. I saw a SaaS triple store a while ago but it seemed to disappear or never really take off.
Nubank recently raised $400M at a $10B valuation and they depend on Datomic [1] heavily for their core systems [2].
The RDF database marketplace is very established [3], and the likes of MarkLogic, DB2, and Oracle have clearly encountered profitable reasons to add RDF support. I believe RDF has good traction in knowledge-intensive industry domains such as clinical research and life sciences.
Disclosure: I work on Crux [4] which adds bitemporal versioning and eviction to a document->triplestore model running on top of Kafka.
+1 I got a lot of mileage out of the triple model when working with social media data. You just don’t know what data patterns you will find when you start looking, and need to support generic queries.
Happy Datomic user here – simple, flexible, powerful. I've recently heard good things about https://www.stardog.com/ which is a real triple store (Datomic adds a time dimension)
Are there any differences between a KV-store in the form of "Bob:Knows" — "John" and a triple store in the form of "Bob" — "Knows" — "John"? Redis, for example, can query the first one easily by scanning.
Bonus question: What are some real-life use cases for triple stores?
A slightly more plausible example is “Who knows John?”. I think about turning to triple stores when I’m still exploring an application domain and don’t know what data access patterns will look like yet. Something like a hexastore that maintains full indexes for all query orders seems like a reasonable compromise for read-heavy applications in the prototype stage.
It is not unusual to implement a triple store with multiple indexes, so you could build k-v stores with
s-p -> o
p-o -> s
s-o -> p
and then you have indexes which are good for those triple patterns.
Let's see.
The core table in the salesforce.com system consists of triples, but salesforce.com will materialize whatever indexes and views are necessary to make things fast based on automatic run-time profiling. Their patent on this should run out just about now, so this feature may turn up in real-life triple stores where it would make a big difference in practicality.
The NSA has been shopping around for a triple store which could ingest around 1 trillion triples per day.
The BBC made a nice web site for the world cup which used forward chaining inference in a triple store to determine the consequences of each goal, so the tables would all adjust whenever anything happened.
How you store your triples affects performance, but, conceptually, is only an implementation detail.
But then, why stop at a KV-store? A set with entries “Bob:Knows:John” will work just as well, if you ignore performance.
But then, why stop at a set? A string “Bob:Knows:John;Bob:Loves:John;John;Is;vegetarian” works just as well (conceptually!)
IMO, a major real-life use case is as a means to produce PhD’s :-). The concept is enticing and easily grasped, but there are zillions of papers to write on query planning, automatic storage optimization, discovering heuristics, etc. It’s just like the early days of SQL: you don’t have to read decades of papers to move to the front of development.
Unless I’m missing something, the in-memory backend here appears to actually use the set solution: all of the triple fields are concatenated together and used as a dictionary key. Queries iterate through the dictionary entries until a sufficient number of results have been located.
On the other hand, I’m not really familiar with Go, so I may be reading it wrong.
>Are there any differences between a KV-store in the form of "Bob:Knows" — "John" and a triple store in the form of "Bob" — "Knows" — "John"? Redis, for example, can query the first one easily by scanning.
A triple store can more quickly answer queries about triples. The reason to use triples is that it is what you naturally get when you try to store structured relational data where the schema changes quickly.
Looks like it is written in Go too. I can see your's being much simpler to get up and running initially though. Looks like akutan isn't as simple since its built on docker and is a daemon.
Triple stores support a disciplined set of primitive types that come from xml schema, so you have "xsd:integer", "xsd:datetime", "xsd:decimal", really the critical things that are missing in JSON. That is, there is a kind of fact where the object is a literal.
Triple stores also support facts where the object is an identifier for another object. That could be a URI which names it, or it could be an internal "blank node" identifier.
Other kinds of "graph database" have different semantics, for instance they might not have support for literals, or have a different set of literal data types, or they might let you attach facts to the edges (hypergraph, property graph, ...)