I've got replicas now working with DML proxy. This essentially means I can now have a cluster of primaries, and then spin up replicas on demand and nodes talking to local host will never see their mutation work pretty transparently from readonly-replicas. While PoC works now the snapshot restore is extremely inefficient IMO yet.
In my experience, you can only use Gemini structured outputs for the most trivial of schemas. No integer literals, no discriminated unions and many more paper cuts. So at least for me, it was completely unusable for what I do at work.
Fixed some of the things, indeed. But don't let this blog post fool you. When using discriminated unions, Pydantic exports JSON schema with `oneOf` while the google-genai expects `anyOf` so that does not work out of the box. Also it still does not support basic stuff like when you have this in you Pydantic model: `foo: Literal[1, 3, 5]`
It's still far from what I'd expect should be supported.
Can you elaborate what you mean - OAI structured outputs means JSON schema doesn't it? So are you just saying they both support JSON schema but Anthropic has a limitation?
OpenAI, in addition to JSON schema, supports "context-free grammar"[0], i.e. regex and lark. Anthropic also supports JSON schema since a few weeks ago, but they don't support specifying the length of JSON array, so you still have to worry about the model producing invalid output.
One thing that pisses me off is this widespread misunderstanding that you can just fall back to function calling (Anthropic's function calling accepts JSON schema for arguments), and that it's the same as structured outputs. It is not. They just dump the JSON schema into the context without doing the actual structured outputs. Vercel's AI SDK does that and it pisses me off because doing that only confuses the model and prefilling works much better.
A lot of skepticism in comments. Let me remind them doing N loops over local disk with in memory cached pages is absolutely different compared to doing RT over typical VPS network. Having said that there is no silver bullet for dumb code! So let's not conflate the argument the author is trying to make.
Kind of what I've been working on to build tenancy on top of SQLite CDC to make it a simple repayable SQLite for Marmot (https://github.com/maxpert/marmot). I personally think we have a synergy here, would drop by your discord.
I have been asked multiple times on why I chose SQLite and not Turso. I've always responded people that I don't trust an open-source project once it's backed by a VC firm. I've moved away from Redis to Val-Key for same reason, and we have seen the Redis train-wreck in slow-mo. I hope at no point in future Turso ever ends up in that state, but chances are pretty high. At this point the "compatible with SQLite" has become a marketing term IMO, we all know how easy it is to break compatibility here or SQLite to break compatibility.
Ok, `io_uring` (like NVMe but for IO commands from application to kernel) and DBSP (high-grade framework for differential (as in, based on Delta streams/diffs not full updates) compression of "incremental view maintenance", it can keep materialized views synchronously up-to-date with a cost proportional to just the diff (for most typical ones; certain queries can of course be doing things at an intermediate stage that blow up and collapse again right after)).
At least notably; not sure about the MVCC `BEGIN CONCURRENT`'s practical relevance though; I am just already familiar enough with the other two big ones to chime in without having to dive into what Turso does about them...
> Ok, `io_uring` (like NVMe but for IO commands from application to kernel)
Are there benchmarks comparing turso with io_uring to sqlite (with other config the same)?
io_uring has the potential to be faster but its not garunteed. It might be the same, it might be slower, depending on how you use it. People bragging about the technology instead of the result of using the technology is a bit of a red flag.
Sqlite had such a stellar stellar reputation, for so many excellent reasons.
I still find it absolutely freakish & abominable that people are so incredibly touchy & reflexively mean & vile to Turso. I've seen a couple Turso centric YouTube's recently and there are dozens and dozens of up votes for what just seems like the most petulant vacuous reflexive bitter viewed comments, dominating the comments. Sqlite deserves its honor, is amazing! Yes! But there's such a wild concentration of negativity about a sqlite compliant open source rust rewrite. None of it is technical. It's all just this extreme conservatism, this reflexive no, I don't trust it, fud fud fud fud.
I'm just so embarrassed having such low antagonistic peers dominating the conversation all the time. With zero moderation, zero maybe it's ok, just dialed 100% to no no no no. For fuck sake man. Everywhere I go it's not hackers, it's not possibility seekers, it's a radical alliance of people using fear uncertainty and doubt to cling to some past, refusing even possibility of different. It's so regular, so consistent, so tiresome and so useless.
What if this is better? What if you are wrong? What if there is some possibility of better? It just feels like all the air time is sucked up by these negative creeps, always, everywhere, all around, with these absurd vast pervading pessimisms that admit to no maybe possiblies, that see no tradeoffs, that are just convinced always for the worst. And it's just so popular! Is the plurality! How anti-hackerly a spirit is anti-possibility! The world deserves better than these endless drag-gards.
I'm obviously reacting strongly here. But I just want some God damned room left for maybe. The negative creeps never allow that: no no no no no, fear uncertainty & doubt endless & abundant, no possibility, just bad. I cannot stand the negative energy, I'm so sad the hackers have to put up with such absolutist shitty drains sucking all the energy from the room, everywhere, always. Sqlite somehow has such a strong anti-possibility anti-energy magnet around something so so good: what a shame, it deserves better, & iteration attempts deserve at least some excitement. Progress is possible, can be neat, and judging way too early & reflexively with empty comment is to be condemned, imho.
I definitely feel this. So many "I made an alternative to X that fixes these issues, or is better in these ways" met with "Well X is fine for me, and I don't need those things, so why change?" These posts are obviously meant for adventurers, people looking to improve on the status quo, have some experimental budget left, etc.
Reading the repo, I'm not sure what it offers. It's still CGO for Go (edit: it's not, it's purego, but can that be used for SQLite too?), Rust already has `rusqlite`. It's beta, so it doesn't have stability, and 99% of why I and many other people choose SQLite is stability.
But they bluntly say you should use it instead of SQLite: "The next evolution of SQLite" (trademark ok?). This not only implies that SQLite has some significant design issues that merit a new version, but it also implies that they, not the SQLite author, are the ones who are capable of doing this. My guess is this is what's rubbing so many people the wrong way.
It's not being sold on its merits, and I think if they're going to make that sort of statement it's fair to make the standard somewhat high. If it's an AI-oriented database, sell it that way, not as an SQLite replacement.
I don't think uv had a negative reaction, because it had a really compelling case.
The way I see it there are a few goals for Turso as opposed to SQLite...
One is to be more open to contribution, which is of arguable value for a pretty "complete" project.
Another is to be able to better support a client-server and distribution model for resilience over only in-process options, which is harder. This is while being file compatible with SQLite for the database itself.
Another aspect is multi-threaded support (mutli-read in particular), which is part of the impetus for rewriting in Rust over the fork, for what may well be a dramatic performance improvement.
Cloudflare and Turso as companies are both using SQLite's interfaces and structure at a core piece of their distributed database offerings... There's definitely different characteristics for use/scale if you're going that route. I've also found CockroachDB to be interesting along with the now deprecated RethinkDB's approach. That doesn't even get into the more prominent distributed cloud db options out there.
In the end they're all just different approaches to solving similar issues.
If you think this discussion is antagonistic, you should see how antagonistic "entrepreneurs" and VCs become when they are in charge of open source projects. Risk aversion is good.
In this case, the familiar "rewrite it in Rust" MO has a special angle: the Turso feature list is such a terrifying collection of high-risk, low-performance, inferior, unlikely to be compatible, unproven and unnecessary departures from SQLite that a malicious embrace-and-extend business plan is a reasonable theory and reckless naivety is the best possible case.
Pretty good vector processing built-in. Time series capabilities. Nice Change-Data-Capture table that I've used & loved. Rust which is easy as hell to embed. Underlying libsqlite is very useful too. The CLI has far better ergonomics than sqlite & good formatting. Async & concurrent writes. Backwards compatibility. Just so ragingly badass. Tries. Isn't narrow & conservative. Amazing test suite.
The discussion didn't seem to be about merits. It just simply seemed to be a bunch of pissy empty whining & loser statements that it wasn't even worth beginning to regard it at all, for dumb petulant reasons x y and z. Fuck that. Fine, I'm happy to sing some praises. But IMO there is a war against imagination & this loserly attitude is the omni present all pervading no value woeful forefront. This pox is everywhere, just no regard, no consideration at all, just out of hand disregard for ridiculous inconsiderate Fear Uncertainty and Doubt anti-reason, thought terminating no's.
Murderers of hacker spirit. Sure, come ask for better! Yes!! Please!!! Inquire & challenge. Push for actual meat (both ways). I saw none, I tried to give you some here. These empty vessels have just vapors of fear, boogiemen to conjure & scare with. No actual content or assessment. So weird to rally so hard against open source, just because it doesn't also hail from 2.5 decades ago. We need more than reflexivism. Or we are shite non hacker people of a low culture.
I complain about negativity because this is rotten & a stink. It's everywhere & so rarely is it of substance, talks to anything. I've tried to add some weight here, and most of what I've said feels basic but this gets bold: I think the weight of anti-possibility weighs heavier & has a bigger mantle to bear in its naysaying than speaking for. We should attune ourselves to consideration. The hacker spirit should favor the idea of possibility above rejection & discarding of potential.
[To be clear, i think sqlite is the hands down winner on this front, no contest. Does the Turso test suite qualify it to be used in safety critical applications? I don't think so].
To your other points - look if it works for you i'm not here to tell you you can't use it. However these features sound more trendy than useful. To me these sound like negatives. A bunch of extra features not related to being a relational database suggests they aren't concentrating on the core product. I dont know enough about their model for async & concurrent writes to really evaluate the cost/benefit, but both those features sound potentially really scary and of questionable value.
At the end of the day its just not a compelling pitch. It seems like trading reliability and stability for a bunch of meaningless bling.
Best of luck to them, but at this point yeah, sqlite sounds like a much better option to me.
It's just so wild to me that people are so married to anti-features like this. That anti-interest do possesses the modern spirit, enraptures people so.
'i don't know what it is but I'm not interested and it's probably scarey' is not, imo, befitting the cultures I personally want to see. There's times and places for extreme conservatism, but generally I am far more here for progress, for trying for aspiring to better, and I thought that was so clearly what the hacker spirit was about.
Progress would be a respectful experiment to hack an implementation of vector indexing, or some other actually useful feature, into the actual SQLite, preferably as an extension.
That would be a valid experiment and, if it goes well, a contribution, while hoping that someone bases anything important on Turso looks like grabbing captive users.
I care that sqlite is being tested against it, because i care that sqlite is well tested. i'm not super concerned that part of the test suite is closed source as i dont need to directly use it.
Yes, I do look through test suites. You can learn a lot from them.
Without seeing it, you have no idea how good it is at all. I'm not knocking the SQLite guys... But it's just a factual statement. It's unknown to most.
https://github.com/tursodatabase/turso/pull/4814 "WAL auto truncation: increase epoch to prevent stale pages reuse", there's a new test with a comment "It is slightly fragile and can be removed if it will be unclear how to maintain it"
https://github.com/tursodatabase/turso/pull/4802/ "fix/translate: revert change that allowed index cursor with stale position to be read", fixes a data-corrupting bug, there's a regression test, good (although the original bug sounds like it should've been caught by a suite like the one SQLite has)
That's just a couple days worth of PRs.
This style of development does not inspire confidence. They develop features, sure. But I want my database to be rock-solid and completely covered by tests, not just move fast and break things. It's not FUD to just look at how they approach PRs.
How can we make sure that fundamental pieces of open source software that power the Internet can have funding, and that the people who write them can have comfortable lives working on the piece of software they love that so many people use?
I think you've described a real problem. But people turn to VC because there are few other ways to make funding happen.
Which SQLite Go library do you use? My biggest pain with using SQLite in Go is often the libraries and the reliance of CGO which is what puts me off using Turso
Edit: Looking at the go mod file I noticed github.com/mattn/go-sqlite3 which I think is a C wrapper library so I'm assuming you rely on CGO for compiling
Nearly every time I write something in JavaScript, the first line is const $ = (selector) => document.querySelector(selector). I do not have jQuery nostalgia as much as many others here, but that particular shorthand is very useful.
For extra flavor, const $$ = (selector) => document.querySelectorAll(selector) on top.
- DDL gets really tricky in these cases, that's why you see Corrosion has this weird file based system.
- cr-sqlite ain't maintained anymore but I did some benchmarks and if I remember correctly it was as slow as 4x-8x depending upon type of your data & load. Storage bloats by 2x-3x, tombstones accumulate pretty fast as well.
I mean each mutation on every column looks something like:
Very helpful hearing about your own similar experiments with CRDTs. As a followup I'd be interested in more direct comparison between Marmot and Corrosion in terms of features/performance, since they both serve a similar use case and Corrosion seems to have worked through some of the CRDT issues you mentioned.
Ok it's a very long discussion but I will try to keep it brief here (more than happy to chat on Marmot Discord if you wanna go deeper). Honestly I've not done head to head comparison, but if you are asking for guestimated comparison:
- Marmot can give you better easy DDL and better replication guarantees.
- You can control the guarantees around transactions. So if you're doing a quorum based transaction, you are guaranteed that quorum has written those set of rows before returning success. This takes care of those conflicting ID based rows getting overwritten that people would usually ignore. And you should be able to do transactions with proper begin and commit statements.
- Disk write amplification is way lower than what you would see in CRDT. This should usually mean that on a commodity hardware you should see better write throughput. As I mentioned on my local benchmarks I'm getting close to 6K insert ops. This was with a cluster of three nodes. So you can effectively multiply it by three and that is like 18k operations per second. I did not set up a full cluster to actually benchmark these. That requires investing more money and time. And I would be honestly frugal over here since I am spending all my $$$ on my AI bill.
- Reads as you can see, you can read directly from the SQLite database. So you are only bottlenecked by your disk speed. There are no fancy mergers that happen on CRDT level in the middle. It's written once and you're ready to read.
- The hardest part in my opinion that I faced was the auto increment IDs. It is a sad reality but turns out 99% of small to mid-size companies, are using the auto increment for IDs. In all CRDTs, in case of conflict, the LWW (based on one ID or another) happens, and I can guarantee you at some point in time without coordination, if nodes are just emitting those regular incrementing IDs, THEY WILL OVERWRITE each other. That was the exact problem in the first version of Marmot.
- SQLite is single writer database. cr-sqlite writes these delta CRDT rows in a table as well, under high write load you are putting too much pressure on WAL, how do I know? I did this in Marmot v0.x and even v2 started with that and eventually I decided to write logs in a SQLite database as well. Turns out at a high throughput even writing or dumping those logs that change logs that I'm gonna discard away is a bad idea. I eventually move to PebbleDB, with mimalloc based unmanaged memory allocator for serialization/deserialization (yes even that caused slowdowns due to GC). It doesn't stop here each row in CRDT entry is for one every column of table (changed column) + it has index for faster lookup. So there that will bog it down further on many many rows. For context I have tested Marmot on gigs of data not megs.
I do have couple of ideas on how I can really exploit the CRDT stuff, but I don't think I need it right now. I think most of stuff can be taken care of if I can build and MVCC layer on top.
> In all CRDTs, in case of conflict, the LWW (based on one ID or another) happens
In my syncing voicenotes application I've begun using UUIDv7 for primary keys, it's working out very well. The database is SQLite. INSERTs are a tad slow, but it has not been a problem in practice. Perhaps I've not deployed and tested enough, but I really feel this was a safe choice.
UUIDv7 does leak create time information, but it INSERTs faster than UUIDv4 because all the INSERTs happen at (or near) the end of the tree on sync.