Five years ago I was absolutely frustrated with the state of Graph databases and...

BlooIt · on Feb 28, 2025

Checkout https://github.com/Pometry/Raphtory, it's written in Rust, embedded (the binaries are about 20mb) and you can use the Python APIs as a drop-in replacement for NetworkX. Disclaimer, I am one of the people behind it.

pjd7 · on Feb 28, 2025

This looks super interesting.

Just starting to review it but my front of mind questions: 1) How do I handle persistence? Looks like some code is missing. 2) Do you support multi-tenancy (b2b saas graph backend for handling relations scoped to a tenant)

Thanks

BlooIt · on Feb 28, 2025

Good questions.

1) You can persist a graph to disk. By default, this uses protobuf (`save_to_file`), however we’re migrating to Parquet in next release for better performance because we noticed loading a 100m edge graph from scratch (CSV, Pandas, or raw Parquet) is actually faster (~1M rows/sec) than from persisted proto, which isn’t ideal. There’s also a private version that uses custom memory buffers for on-disk storage, handling updates and compaction automatically.

2) You can run a Raphtory instance either as a GraphQL server or an embedded library. For the server, multiple users can query the persisted graphs, which are stored in a simple folder structure with namespaces (for different graphs). For now, access control needs to be managed externally, however it's on our roadmap!

gkorland · on Feb 28, 2025

Your graph DB frustrations mirror what many experienced with Neo4j. If you refresh your project, consider including FalkorDB (formerly RedisGraph) - it uses sparse adjacency matrices and GraphBLAS for much better performance while supporting Cypher.

Would be interesting to see updated benchmarks comparing these newer options against PostgreSQL extensions.

threeseed · on Feb 28, 2025

You ran Neo4J with 512MB even thought it has always recommended 2GB at a minimum.

And MemGraph is nice but it's memory only where as Neo4J is designed for super large graphs that live on the filesystem. Not really that comparable.

henryfjordan · on Feb 27, 2025

I had almost exactly the opposite experience, although my dataset was pretty small.

We wanted to store a graph in postgres and ended up writing some recursive queries to pull subgraphs then had NetworkX layered over it to do some more complex graph operations. We ended up doing that for a short while but then switched to Neo4j because of how comparatively easy it was to write queries (although the Python support for Neo4j was severely lacking). Never really stressed it out on dataset size though.

I did manage to crash Redis' graph plugin pretty quickly when I was testing that.

SahAssar · on Feb 28, 2025

Not sure what you consider "quite small" and I don't know how NetworkX works, but postgresql recursive queries have worked well for me for small graphs.

Could you share what the data structure and scale was?

henryfjordan · on Feb 28, 2025

We basically had a single table that we wanted to be able to nest on itself arbitrarily. Think categories and subcategories, maybe 100k nodes/rows

Postgres worked fine but cypher is so much more expressive and handles stuff like loop detection for you, neo4j was much easier to work with. Performance wasn't ever really an issue with either.

zozbot234 · on Feb 28, 2025

Note that more recent versions of Postgres have added support for the CYCLE keyword, for easier loop detection.

AlphaSite · on Feb 28, 2025

We have something similar in Postgres but IMO disconnectedness also plays a really big part in this whole calculation. We actually ended up just changing the transitive closure for fast operations (and simpler code).

philjohn · on Feb 28, 2025

A good 10 years ago or so I was running a solution that used RDF Quad Stores - and the best one at the time (after trialling 4Store, Marklogic and some others I can't remember) was OpenLink Virtuoso - how they managed to fit a performant distributed Quad store into what started life as an SQL engine was impressive.

I've left that world now, but if you're in the market for a graph store again, it might be something to look at.

wslh · on Feb 28, 2025

Have you tried NetworkDisk[1] to manipulate NetworkX graphs on disk?

[1] https://networkdisk.inria.fr/

nemo44x · on Feb 28, 2025

Neo4j is pretty bad and very dated. No idea what they’re doing. MemGraph is a much better tool.

Really graph is a feature and not a product.

threeseed · on Feb 28, 2025

Neo4J is mature not dated which is why it's so popular.

And couldn't disagree more that graph is a feature. You really want something optimised for it (query language / storage approach) as the data structure is so different in every way from a relational or document store.

thesz · on Feb 28, 2025

> ...as the data structure is so different in every way from a relational or document store.

No, it is not.

[1] https://en.wikipedia.org/wiki/Worst-case_optimal_join_algori...

Graph processing can create substantial amount of intermediate data if it is done in typical join implementation fashion (nested loops or hash join). So it may appear that graph processing needs a tailored approach.

But what can help graph algorithms can help SQL query execution as well and vice versa, see the link above.

For example, TPC-DS contains queries that (indirectly) joins same tables multiple times (query 4, for example). This is, basically, a kind of centrality metric computation for a graph represented by the tables.

tucnak · on Feb 28, 2025

How is it different? Isn't a graph basically two sets of tuples: edges and nodes? I played with Cayley (Google) for a little while, & that was my impression.

bubblyworld · on Feb 28, 2025

I think it's less a matter of "can you represent graphs in a relational DB" (of course you can), and more about what kind of queries the DB is optimised for. Graph databases are intended for complex recursive queries on relatively unstructured data. You could certainly do that in SQL if you wanted to, but you'll pay for it performance-wise.

Graph query languages also make those kinds of queries much easier to express in the first place.

tucnak · on Feb 28, 2025

So the underlying storage is conventional, it's still tuples of some kind, and it's only a matter of how indexes are laid out? Otherwise, I'm struggling to see how it could "optimise" for certain access patterns. How would a typical graph database index be different from a btree access method in Postgres?

bubblyworld · on Feb 28, 2025

I don't know much about the internal details of postgres. But there is a ton of detail underlying "it's just tuples of some kind" and there are lots of ways to implement indices, no? Is it so difficult to imagine that different implementations have different performance properties?

There's also the query planner layer to think about too.

tucnak · on Feb 28, 2025

[flagged]

bubblyworld · on Feb 28, 2025

No need for the snark. If you want specific details of how postgres differs from graph databases I have nothing for you. I just find your position that btrees are optimised for every query structure... obviously false on general grounds? Like a thing to do to make recursive queries faster is to store relations as direct pointers of some kind, rather than doing index scans for every level of join.

Perhaps we're talking past each other about the word "optimised".

tucnak · on March 3, 2025

> I just find your position that btrees are optimised for every query structure

But that is not my position! Postgres has many index access methods: hash, btree, brin, gin, gist, and there are extensions for rum, bloom, skipscans, geospatial indexes such as sp-gist, & vector indexes like ivf/hnws (see pgvector.) I mean, as far as graph databases are concerned, besides pgRouting, there's also Apache AGE which is a graph-"optimised" Postgres.

You should learn more about Postgres and databases in general. See comment above. https://news.ycombinator.com/item?id=43203833 which is closely related to the argument I am actually making.

bubblyworld · on March 5, 2025

Fair, and I apologise for misrepresenting it. I should definitely learn more about databases in general!