Hacker Newsnew | past | comments | ask | show | jobs | submit | gepheum's commentslogin

The languages that Skir supports officially: C++, Dart, Go, Java, Kotlin, Python, Rust, Swift, Typescript

Links: - Website: https://skir.build/ -- Skir Go doc: https://skir.build/docs/go -- Skir Rust doc: https://skir.build/docs/rust -- Skir Swift doc: https://skir.build/docs/swift - Github: https://github.com/gepheum/skir - Blog post: https://medium.com/@gepheum/i-spent-15-years-with-protobuf-t... - Discord: https://discord.gg/mruvDuybJ


flatbuffers and capnproto are in the game of trying to make serialization to binary format as efficient as possible. Their goal is trying to beat benchmarks: how long it takes to convert an object to bytes and vice-versa. It's cool, but I personally think that for most use cases (not all), serialization efficiency shouldn't be the primary goal: serialization time is often negligible compared to time it takes to send data over the wire, and it's less important than other features (e.g. quality of the generated API) that some of these techs might neglect. I have an example to illustrate this. With Proto3, Google decided that when encoding a `string` field in C++, it would not perform UTF-8 validation. This leads to better benchmark metrics. This has also been a horrible mistake that led to many bugs which have costed so much in eng hours, since for example the same protobuf C++ API fails at deserialization when it encounters an invalid UTF-8 string.

As per messagepack, jsonbinpack, these seem to be layers on top of JSON to make JSON more compact. They still use field names for field identity, which I think can be problematic for long-term data persistence since it prevents renaming fields. I think the Protobuf/Thrift approach of using meaningless field numbers in serialization forms is better.


> flatbuffers and capnproto are in the game of trying to make serialization to binary format as efficient as possible.

Little-understood fact about Cap'n Proto: Serialization is not the game at all. The RPC system is the whole game, the serialization was just done as a sort of stunt. Indeed, unless you are mmap()ing huge files, the serialization speed doesn't really matter. Though I would say the implementation of Cap'n Proto is quite a bit simpler than Protobuf due to the serialization format just being simpler, and that in itself is a nice benefit.

The recently-released Cap'n Web jettisons the whole serialization side and focuses just on the RPC system: https://blog.cloudflare.com/capnweb-javascript-rpc-library/

(I'm the author of Cap'n Proto and Cap'n Web.)


I stand corrected.


Thanks for the comment! Agree with you about the horrible Protobuf-to-Python bidding, it was a big frustration and definitely contributed to me wanting to build Skir.

1. You can create an enum with just "wrapper" fields, that's exactly like a oneof 2. Totally fair, I'm planning to work on this later this year, probably Q3 (priority is adding support to 4 more languages, and then I'll get to it) 3. So there is introspection in the 6 targeted languages, and I think I did it a bit better than protobuf because it generally has better type safety. Example in C++: https://github.com/gepheum/skir-cc-example/blob/main/string_... Typescript: https://skir.build/docs/typescript#reflection I realize I haven't documented it in Python (although it is available and generally the same API as Typescript), will fix that However, you're right that there is no support yet for annotations. Still trying to gauge whether that's needed 4. Assuming you mean Go language: working on that now, hoping to have C#, Go, Rust and Swift in the next 2-3 months.


> So there is introspection in the 6 targeted languages, and I think I did it a bit better than protobuf because it generally has better type safety.

A bit different kind of introspection. In Protobuf I can write a code generator that loads the compiled PB descriptions and then generates whatever it needs.

For example, I'm using it to generate SQL-serializing wrappers for my Protobuf types for Go.

Oh yeah, also having a standardized pretty-printer would be great.


Hey, thanks a lot for the comment! I share your frustration with protobuf: although I think it's great, it carries a few design flaws which are hard to fix at this point and they create pain points which are not going away.

I completely agree with you about timing, wish I had done this 10 years ago :)

"Here is an idea to contemplate as a side gig with your favorite Ai assistant: A tool to convert proto to Skir. Or at least as much as possible. As someone who had to maintain larger and complex proto files, a lot of proto specific pain points are addressed." < I tried asking Claude: "Migrate this project from protobuf to Skir, see https://skir.build/" and it works pretty well. I created http://skir.build/llms.txt which helps with this. The pain point is data migration though, and as much as I want Skir to succeed: I cannot recommend people migrating from protobuf to Skir if they have some persisted data to migrate, the effort is probably not worth it.


Hey, Skir does have numerical tagging, see https://skir.build/docs/language-reference#structs


This seems new and retrofit.

The implicit version is brittle design for backwards compatibility.

People/LLMs will keep adding fields out of order and whatever has been serialised (both in client/server interaction, and stored in dbs) will be broken.


I looked at Prisma, I very much prefer the Protobuf/Thrift model of using numbers to identify fields, which allows 2 important things: fields to be renamed without breaking backward compatibility, and a compact wire format.

I think the Protobuf language (which Skir is heavily influenced by) has some flaws in its core design, e.g. the enum/oneof mess, the fact that it allows spare field numbers which makes the "dense JSON" format (core feature of Skir) harder to get, the fact that it does not allow users to optionally specify a stable identifier to a message to get compatibility checks to work.

I get your point about "why building another language", but also that point taken too far means that we would all be programming in Haskell.


Thanks for the comment. I am very familiar with Buf+Protobuf, I think it's a great system overall but has many limitations which I think can be overcome by redesigning the language from scratch instead of building on top of the .proto syntax. In the Skir vs Protobuf part of the blog post [https://medium.com/@gepheum/i-spent-15-years-with-protobuf-t...], only 2 out of 10 pertain to "syntax" (and they're a bit more than syntax). Since you mention compatibility check, Buf's compatibility check prevents message renaming, which is a huge limitation. With Skir, that's not the case. You also get the compatibility checks verified in the IDE.


Sorry, you are wasting your time arguing, I am pretty sure this "user" is an LLM.

edit: Apparently not!


I could say the same thing about you. Come up with some evidence and we can talk about it.


They are using an LLM. I've seen accounts that are very obviously LLM bots, but have a human in the loop to reply when you accuse them. Then, of course, they go back to posting obvious LLM text.


That is correct and that is a good catch, the idea though is that when you remove a field you typically do that after having made sure that all code no longer read from the removed field and that all binaries have been deployed.


How does this work if, for example, you persist the data in a database?


Let's imagine you have this:

``` struct User { id: int64; email: string?; name: string; } ```

You store some users in a database: [10,"[email protected]""john"], [11,"jane",null,"[email protected]"]

You remove the email field later:

``` struct User { id: int64; name: string; removed; } ```

Supposedly you remove a field after you have migrated all code that uses the field and you have deployed all binaries.

In your DB, you still have [10,[email protected]","john"], [11,null,"jane"], which you are able to deserialize fine (the email field is ignored). New values that you serialize are stored as [12,0,"jack"]. If you happen to have old binaries which still use the old email field and which are still running (which you shouldn't, but let's imagine you accidentally didn't deploy all your binaries before you removed the field), these new binaries will indeed decode the email field for new values (Jack) as an empty string instead of null.


+1

Copying from blog post [https://medium.com/@gepheum/i-spent-15-years-with-protobuf-t...]:

""" Should you switch from Protobuf?

Protobuf is battle-tested and excellent. If your team already runs on Protobuf and has large amounts of persisted protobuf data in databases or on disk, a full migration is often a major effort: you have to migrate both application code and stored data safely. In many cases, that cost is not worth it.

For new projects, though, the choice is open. That is where Skir can offer a meaningful long-term advantage on developer experience, schema evolution guardrails, and day-to-day ergonomics. """


Thanks for the feedback.

0. Yes, I looked at Avro, Ion. I like Protobuf much better because I think using field numbers for field identity, meaning being able to rename fields freely, is a must.

1. Yes. Skir also supports that with binary format (you can serialize and deserialize a Skir schema to JSON, which then allows you to convert from binary format to readable JSON). It just requires to build many layers of extra tooling which can be painful. For example, if you store your data in some SQL engine X, you won't be able to quickly visualize your data with a simple SELECT statement, you need to build the tooling which will allow you to visualize the data. Now dense JSON is obviously not idea for this use case, because you don't see the field names, but for quick debugging I find it's "good enough".

3. I agree there are definitely cases where it can be painful, but I think the cases where it actually is helpful are more numerous. One thing worth noting is that you can "opt-out" of this feature by using `ClassName.partial(...)` instead of `ClassName()` at construction time. See for example `User.partial(...)` here: https://skir.build/docs/python#frozen-structs I mostly added this feature for unit tests, where you want to easily create some objects with only some fields set and not be bothered if new fields are added to the schema.

4. Good question. I guess you mean "forward compatibility": you add a new field to the enum, not all binaries are deployed at the same time, and some old binary encounters the new enum it doesn't know about? I do like Protobuf does: I default to the UNKNOWN enum. More on this: - https://skir.build/docs/schema-evolution#adding-variants-to-... - https://skir.build/docs/schema-evolution#default-behavior-dr... - https://skir.build/docs/protobuf#implicit-unknown-variant


> meaning being able to rename fields freely, is a must.

avro supports field renames though.

3. on second thought i believe you'd only have to deploy when you choose. the next build will force you to provide values (or opt into the default). so forcing inspection of construction sites seems good.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: