Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Couple of quick points a. If you are primarily dealing with categorical data - strings as opposed to numbers, graphs are pretty good for storage, retrieval and visualization. Categorical - genes, diseases etc. and require a lot of graph algorithms eigenvalue, shortest path etc. Biggest difference in querying is - in SQL you say "what" you want, in SPARQL / Gremlin you say "how" you want it i.e. what relationships to take b. Graph as a representation format shines, but as a storage mechanism, have not found it to be optimal. Many go for graph as a layer on top of RDBMS c. RDF is better in terms of standardization instead of prop. Graph Database. This is because you can arbitrarily decide what should be a vertex vs. what should be a property. In things like Neo4J it gets fixed once you decide. Virtuoso comes pretty close since it implements RDF on a RDBMS (my limited understanding) d. It is good for representing knowledge / metadata (atleast RDF) but again I would stay away from representing data. e. Your choice of graph algorithms typically ends up being what comes prepackaged (say gremlin etc.), or you take it intermediary and use algorithms there (Networkx / igraph (igraph is awesome)) or writing your own (this is not trivial typically) f. Many pointed about the schema, I actually think this is the advantage of RDF. My typical workflow is to start with RDF, do my basic stuff on RDF until I have a good understanding of what are the queries and therefore optimal schema and then migrate to RDBMS as needed. Trying to do large scale on RDF on a laptop infra is not optimal

I would try to use some combination of RDBMS with runtime graph like igraph. YMMV.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: