Inference costs at least seem like the thing that is easiest to bring down, and there's plenty of demand to drive innovation. There's a lot less uncertainty here than with architectural/capability scaling. To your point, tomorrow's commodity hardware will solve this for the demands of today at some point in the future (though we'll probably have even more inference demand then).
As long as you don't deviate too much from ANSI, I think the 'light sql DSL' approach has a lot of pros when you control the UX. (so UIs, in particular, are fantastic for this approach - what they seem to be targeting with queryies and dashboards). It's more of a product experience; tables are a terrible product surface to manage.
Agreed with the ecosystem cons getting much heavier as you move outside the product surface area.
Personally I think that's worse. SQL - which is almost ubiqutous - already suffers from a fragmentation problem because of the complex and dated standardization setup. When I learn a new DBMS the two questions I ask at the very start are: 1. what common but non-standard features are supported? 2. what new anchor-features (often cool but also often intended to lock me to the vendor) am I going to pick up?
First I need to learn a new (even easy & familiar) language, second I need to be aware of what's proprietary & locks me to the vendor platform. I'd suspect they see the second as a benefit they get IF they can convince people to accept the first.
I actually 100% agree with your for a new DBMS and share your frustration with vendor-specific features and lock-in. At that level, it's often actively counterproductive for insurgent DBs - ecosystem tooling needs more work to interface with your shiny new DB, etc - and that's why we always see anyone who starts with a non-standard SQL converge on offering ANSI SQL eventually.
I think an application that exposes a curated dataset through a SQL-like interface - so the dashboard/analytic query case described here - is where I think this approach has value. You actually don't want to expose raw tables, INFORMATION_SCHEMA, etc - you're offering a dedicated query language on top of a higher level data product offering, and you might as well take the best of SQL and leave the bits you don't need. (You're not offering a database as a service; you're offering data as a service).
This is where I think we need better tooling around tiered validation - there's probably quite a bit you can run locally if we had the right separation; splitting the cheap validation from the expensive has compounding benefits for LLMs.
If you found two disjoint sections that seemed positive on their own, did you try looping both separately in the same model? Wondering how localized the structures are.
For all the places it's bad at, AI has been fantastic for making targeted data experiences a lot more accessible to build (see MotherDuck and dives, etc), as long as you can keep the actual data access grounded. Years of tableau/looker have atrophied my creativity a bit, trying to get back to having more fun.
Nice! I’ve been working on https://treeseek.ca which is a different use case from most of the other open data tree sites I’ve seen — I want to be instantly geolocated and shown the nearest trees to me. I do a lot of walking and am often mesmerized by a particular tree, and I wanted something to help me identify them as quickly as possible, with more confidence and speed than e.g. iNaturalist (which i do also use).
This is an app that’s been bouncing around in my head for over a decade but finally got it working well enough for my own purposes about a year and a half ago.
Oh that's great! I was finding fun tree collections and wanted to go see them - unfortunately not in SF so not likely - but your app has some nice data around me that I can check out! Are you primarily using OSM data?
I was thinking of a google maps kind of "here you are, here's your walking path of interesting trees" potentially, or something else that could tie the overview to the street experience - on the backlog!
So the tree data itself mainly comes from municipal open data, just like yours does. Street Trees datasets are pretty common across cities. I just added SF yesterday after replying here :)
Otherwise the map tiles are coming from OpenFreeMap [1] which are indeed based on OSM.
Next steps I'm interested in are including economic + ecological benefits of the trees, highlighting potential pests / invasive species, maybe some other basic info about the species sourced from Wikipedia.
I like how you've got different icons for different types of trees; I've been thinking about how to encode DBH data as well but haven't settled on anything yet.
Yeah I have a 'species' info table that's built by curating wikipedia and a few other sources and passing them through a structured LLM pipeline; ecological benefit; blooming season; native regions, etc. This is very much a 'rough cut' at the moment; I want to put more quality gates and evals in it. If you're interested in collaborating all the raw parquet datasets I have are in a public GCS bucket - happy to have them pulled in anywhere else!
DBH I'm doing for the "size" right now, though I'd love to figure out how to get canopy shape/size as well, and height where possible. (and then maybe proxy height a species level from DBH, since that's more common).
A belated - nice to see this, I think an intermediate IR is the right way to gate access rather than raw SQL (full disclosure; also hacking on open-source solutions in that space) - how are you balancing expressiveness vs control? Some of the more impressive text to SQL demos rely on agents being able to do fairly complicated calculations - which it seems like IL could support - but I'm still seeing the IL example as containing some of the footguns they also hit - join type (inner/vs outer, group by etc). How are you balancing having enough safe moves to be useful, while not introducing unsafe combinations of moves?
There's a lot of useful autonomous things that don't require unrestricted outbound communication, but agreed that the "safe" claw configuration probably falls quite a bit short of the popular perception of a full AI assistant at this point.
reply