what are modern practices for scaling rails apps when the database is the bottleneck? I have a GIS-heavy app that is built with rails, but leans on PostGIS a lot for the heavy computation. Simply getting bigger and bigger database instances doesn't seem to scale very well, and moving the GIS computation code in to ruby is not an option either (RGeo is good for some things, but can't replace PostGIS for most of my needs)
Postgres 10 introduced native table partitioning and it became much easier to use in postgres 11. This will let you divide large tables into smaller ones that can be queried more efficiently
Rails 6 introduced multiple databases natively in activerecord. You can more easily have read only databases or segment off high write workloads.
Foreign tables are a way to allow bringing in data from multiple databases into one so it “looks” like everything is in a single db. This got better/faster, particularly with joins in postgres 12.
Also consider things like materialized views or even plain views which can help bring together just the right data you need ahead of time.
Thanks! Partitioned and foreign tables look quite interesting -- I'm at the point where I need to scale the database compute load (with write access) across many nodes, but coordinating those nodes has seemed challenging with rails.
And yeah I can probably lean on views a bit more to save some search time for point/line-in-polygon operations.
Yes -- it's a good talk, but it focuses mostly on tricks you can do (spatial indexing, geometry subdividing) on the PostGIS side to make lookups/calculations faster, but discussion of actually scaling to multiple databases is a bit light. There is mention of Citus (which Microsoft has purchased and rebranded as Hyperscale), but it has a minimum cost that is more than I can afford right now.
I didn't know if there were other architectures to consider for rails -- like having a few read replicas for web traffic, and then a single large writable master for sidekiq/background jobs, for instance.
If you're at the limits of "scaling up" in PostGIS you could look into "scaling out" by migrating to a NoSQL solution like Cassandra or HBase + GeoMesa[1]. However, this would be a pretty significant change and introduces a lot of new operational burden, so I wouldn't do it unless you think you can't meet your goals with PostGIS.
I think a good first step would be bringing in a PostGIS consultant who can give you some expert advice on tuning for your current workload.
I appreciate it! Postgis I think is still quite good, it's just computationally intense for some operations (which I suspect geomesa might run in to as well -- I think a lot of gis libraries make use of gdal and proj libraries under the hood)