what are modern practices for scaling rails apps when the database is the bottle...

malyk · on Oct 12, 2019

Postgres 10 introduced native table partitioning and it became much easier to use in postgres 11. This will let you divide large tables into smaller ones that can be queried more efficiently

Rails 6 introduced multiple databases natively in activerecord. You can more easily have read only databases or segment off high write workloads.

Foreign tables are a way to allow bringing in data from multiple databases into one so it “looks” like everything is in a single db. This got better/faster, particularly with joins in postgres 12.

Also consider things like materialized views or even plain views which can help bring together just the right data you need ahead of time.

Just a couple options off the top of my head.

durkie · on Oct 12, 2019

Thanks! Partitioned and foreign tables look quite interesting -- I'm at the point where I need to scale the database compute load (with write access) across many nodes, but coordinating those nodes has seemed challenging with rails.

And yeah I can probably lean on views a bit more to save some search time for point/line-in-polygon operations.

bluedino · on Oct 12, 2019

Have you read this: http://s3.cleverelephant.ca/2017-cdb-postgis.pdf

durkie · on Oct 12, 2019

Yes -- it's a good talk, but it focuses mostly on tricks you can do (spatial indexing, geometry subdividing) on the PostGIS side to make lookups/calculations faster, but discussion of actually scaling to multiple databases is a bit light. There is mention of Citus (which Microsoft has purchased and rebranded as Hyperscale), but it has a minimum cost that is more than I can afford right now.

I didn't know if there were other architectures to consider for rails -- like having a few read replicas for web traffic, and then a single large writable master for sidekiq/background jobs, for instance.

jacobkg · on Oct 12, 2019

Read replicas is a good way to go. Rails 6 has good built-in support for them:

https://www.citusdata.com/blog/2019/05/23/rails-6-multiple-d...

alexhutcheson · on Oct 12, 2019

If you're at the limits of "scaling up" in PostGIS you could look into "scaling out" by migrating to a NoSQL solution like Cassandra or HBase + GeoMesa[1]. However, this would be a pretty significant change and introduces a lot of new operational burden, so I wouldn't do it unless you think you can't meet your goals with PostGIS.

I think a good first step would be bringing in a PostGIS consultant who can give you some expert advice on tuning for your current workload.

[1] https://github.com/locationtech/geomesa

durkie · on Oct 12, 2019

I appreciate it! Postgis I think is still quite good, it's just computationally intense for some operations (which I suspect geomesa might run in to as well -- I think a lot of gis libraries make use of gdal and proj libraries under the hood)