Hacker Newsnew | past | comments | ask | show | jobs | submit | vitaut's commentslogin

I was impressed how fast the Rust folks adopted this! Kudos to David Tolnay and others.

Somewhat notable is that `char8_t` is banned with very reasonable motivation that applies to most codebases:

> Use char and unprefixed character literals. Non-UTF-8 encodings are rare enough in Chromium that the value of distinguishing them at the type level is low, and char8_t* is not interconvertible with char* (what ~all Chromium, STL, and platform-specific APIs use), so using u8 prefixes would obligate us to insert casts everywhere. If you want to declare at a type level that a block of data is string-like and not an arbitrary binary blob, prefer std::string[_view] over char*.


`char8_t` is probably one of the more baffling blunders of the standards committee.

there is no guarantee `char` is 8 bits, nor that it represents text, or even a particular encoding.

If your codebase has those guarantees, go ahead and use it.


> there is no guarantee `char` is 8 bits, nor that it represents text, or even a particular encoding.

True, but sizeof(char) is defined to be 1. In section 7.6.2.5:

"The result of sizeof applied to any of the narrow character types is 1"

In fact, char and associated types are the only types in the standard where the size is not implementation-defined.

So the only way that a C++ implementation can conform to the standard and have a char type that is not 8 bits is if the size of a byte is not 8 bits. There are historical systems that meet that constraint but no modern systems that I am aware of.

[1] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/n49...


That would be any CPU with word-addressing only. Which, granted, is very exotic today, but they do still exist: https://www.analog.com/en/products/adsp1802.html

Don't some modern DSPs still have 32bit as minimum addressable memory? Or is it a thing of the past?

If you're on such a system, and you write code that uses char, then perhaps you deserve whatever mess that causes you.

char8_t also isn't guaranteed to be 8-bits, because sizeof(char) == 1 and sizeof(char8_t) >= 1. On a platform where char is 16 bits, char8_t will be 16 bits as well

The cpp standard explicitly says that it has the same size, typed, signedness and alignment as unsigned char, but its a distinct type. So its pretty useless, and badly named


Wouldn't it be rather the case that char8_t just wouldn't exist on that platform? At least that's the case with the uintN_t types, they are just not available everywhere. If you want something that is always available you need to use uintN_least_t or uintN_fast_t.


It is pretty consistent. It is part of the C Standard and a feature meant to make string handling better, it would be crazy if it wasn't a complete clusterfuck.

There's no guarantee char8_t is 8 bits either, it's only guaranteed to be at least 8 bits.

> There's no guarantee char8_t is 8 bits either, it's only guaranteed to be at least 8 bits.

Have you read the standard? It says: "The result of sizeof applied to any of the narrow character types is 1." Here, "narrow character types" means char and char8_t. So technically they aren't guaranteed to be 8 bits, but they are guaranteed to be one byte.


Yes, but the byte is not guaranteed to be 8 bits, because on many ancient computers it wasn't.

The poster to whom you have replied has read correctly the standard.


What platforms have char8_t as more than 8 bits?

Well platforms with CHAR_BIT != 8. In c and c++ char and there for byte is atleast 8 bytes not 8 bytes. POSIX does force CHAR_BIT == 8. I think only place is in embeded and that to some DSPs or ASICs like device. So in practice most code will break on those platforms and they are very rare. But they are still technically supported by c and c++ std. Similarly how c still suported non 2's complement arch till 2023.

That's where the standard should come in and say something like "starting with C++26 char is always 1 byte and signed. std::string is always UTF-8" Done, fixed unicode in C++.

But instead we get this mess. I guess it's because there's too much Microsoft in the standard and they are the only ones not having UTF-8 everywhere in Windows yet.


char is always 1 byte. What it's not always is 1 octet.

you're right. What I meant was that it should always be 8 bit, too.

std::string is not UTF-8 and can't be made UTF-8. It's encoding agnostic, its API is in terms of bytes not codepoints.

Of course it can be made UTF-8. Just add a codepoints_size() method and other helpers.

But it isn't really needed anyway: I'm using it for UTF-8 (with helper functions for the 1% cases where I need codepoints) and it works fine. But starting with C++20 it's starting to get annoying because I have to reinterpret_cast to the useless u8 versions.


First, because of existing constraints like mutability though direct buffer access, a hypothetical codepoints_size() would require recomputation each time which would be prohibitively expensive, in particular because std::string is virtually unbounded.

Second, there is also no way to be able to guarantee that a string encodes valid UTF-8, it could just be whatever.

You can still just use std::string to store valid encoded UTF-8, you just have to be a little bit careful. And functions like codepoints_size() are pretty fringe -- unless you're not doing specialized Unicode transformations, it's more typical to just treat strings as opaque byte slices in a typical C++ application.


Perfect is the enemy of good. Or do you think the current mess is better?

How many non-8-bit-char platforms are there with char8_t support, and how many do we expect in the future?

TI C2000 is one example

Thank you. I assume you're correct, though for some reason I can't find references claiming C++20 being supported with some cursory searches.

Mostly DSPs

Is there a single esoteric DSP in active use that supports C++20? This is the umpteenth time I've seen DSP's brought up in casual conversations about C/C++ standards, so I did a little digging:

Texas Instruments' compiler seems to be celebrating C++14 support: https://www.ti.com/tool/C6000-CGT

CrossCore Embedded Studio apparently supports C++11 if you pass a switch in requesting it, though this FAQ answer suggests the underlying standard library is still C++03: https://ez.analog.com/dsp/software-and-development-tools/cce...

Everything I've found CodeWarrior related suggests that it is C++03-only: https://community.nxp.com/pwmxy87654/attachments/pwmxy87654/...

Aside from that, from what I can tell, those esoteric architectures are being phased out in lieu of running DSP workloads on Cortex-M, which is just ARM.

I'd love it if someone who was more familiar with DSP workloads would chime in, but it really does seem that trying to be the language for all possible and potential architectures might not be the right play for C++ in 202x.

Besides, it's not like those old standards or compilers are going anywhere.


Cadence DSPs have C++17 compatible compiler and will be c++20 soon, new CEVA cores also (both are are clang based). TI C7x is still C++14 (C6000 is ancient core, yet still got c++14 support as you mentioned). AFIR Cadence ASIP generator will give you C++17 toolchain and c++20 is on roadmap, but not 100% sure.

But for those devices you use limited subset of language features and you would be better of not linking c++ stdlib and even c stdlib at all (so junior developers don't have space for doing stupid things ;))


Green Hills Software's compiler supports more recent versions of C++ (it uses the EDG frontend) and targets some DSPs.

Back when I worked in the embedded space, chips like ZSP were around that used 16-bit bytes. I am twenty years out of date on that space though.


How common is it to use Green Hills compilers for those DSP targets? I was under the impression that their bread was buttered by more-familiar-looking embedded targets, and more recently ARM Cortex.

Dunno! My last project there was to add support for one of the TI DSPs, but as I said, that's decades past now.

Anyway, I think there are two takeaways:

1. There probably do exist non-8-bit-byte architectures targeted by compilers that provide support for at-least-somewhat-recent C++ versions

2. Such cases are certainly rare

Where that leaves things, in terms of what the C++ standard should specify, I don't know. IIRC JF Bastien or one of the other Apple folks that's driven things like "twos complement is the only integer representation C++ supports" tried to push for "bytes are 8 bits" and got shot down?


> but it really does seem that trying to be the language for all possible and potential architectures might not be the right play for C++ in 202x.

Portability was always a selling point of C++. I'd personaly advise those who find it uncomfortable, to choose a different PL, perhaps Rust.


> Portability was always a selling point of C++.

Judging by the lack of modern C++ in these crufty embedded compilers, maybe modern C++ is throwing too much good effort after bad. C++03 isn't going away, and it's not like these compilers always stuck to the standard anyway in terms of runtime type information, exceptions, and full template support.

Besides, I would argue that the selling point of C++ wasn't portability per se, but the fact that it was largely compatible with existing C codebases. It was embrace, extend, extinguish in language form.


> Judging by the lack of modern C++ in these crufty embedded compilers,

Being conservative with features and deliberately not implementing them are two different thing. Some embedded compilers go through certification, to be allowed to be used producing mission critical software. Chasing features is prohibitively expensive, for no obvious benefit. I'd bet in 2030s most embedded compiler would support C++ 14 or even 17. Good enough for me.


> Being conservative with features and deliberately not implementing them are two different thing.

There is no version of the C++ standard that lacks features like exceptions, RTTI, and fully functional templates.

If the compiler isn't implementing all of a particular standard then it's not standard C++. If an implementation has no interest in standard C++, why give those implementations a seat at the table in the first place? Those implementations can continue on with their C++ fork without mandating requirements to anyone else.


> If the compiler isn't implementing all of a particular standard then it's not standard C++.

C++ have historically been driven by practicalities, and violated standards on regular basis, when it deemed useful.

> Those implementations can continue on with their C++ fork without mandating requirements to anyone else.

Then they will diverge too much, like it happened with countless number of other languages, like Lisp.


Non-8-bit-char DSPs would have char8_t support? Definitely not something I expected, links would be cool.

Why not? except it is same as `unsigned char` and can be larger than 8 bit

ISO/IEC 9899:2024 section 7.30

> char8_t which is an unsigned integer type used for 8-bit characters and is the same type as unsigned char;


The exact size types are never present on platforms that don't support them.

> Why not?

Because "it supports Unicode" is not an expected use case for a non-8-bit DSP?

Do you have a link to a single one that does support it?


char on linux arm is unsigned, makes for fun surprises when you only ever dealt with x86 and assumed char to be signed everywhere.

This bit us in Chromium. We at least discussed forcing the compiler to use unsigned char on all platforms; I don't recall if that actually happened.

I recall that google3 switched to -funsigned-char for x86-64 a long time ago.

A cursory Chromium code search does not find anything outside third_party/ forcing either signed or unsigned char.

I suspect if I dug into the archives, I'd find a discussion on cxx@ with some comments about how doing this would result in some esoteric risk. If I was still on the Chrome team I'd go looking and see if it made sense to reraise the issue now; I know we had at least one stable branch security bug this caused.


Related: in C at least (C++ standards are tl;dr), type names like `int32_t` are not required to exist. Most uses, in portable code, should be `int_least32_t`, which is required.

Isn't the real reason to use char8_t over char that it that char8_t* are subject to the same strict aliasing rules as all other non-char primitive types? (i.e., the compiler doesn't have to worry that a char8_t* could point to any random piece of memory like it would for char*?).

At least in Chromium that wouldn't help us, because we disable strict aliasing (and have to, as there are at least a few core places where we violate it and porting to an alternative looks challenging; some of our core string-handling APIs that presume that wchar_t* and char16_t* are actually interconvertible on Windows, for example, would have to begin memcpying, which rules out certain API shapes and adds a perf cost to the rest).

The main effect of this is that some of the conversions between char and char8_t are inefficient.

> using u8 prefixes would obligate us to insert casts everywhere.

Unfortunately, casting a char8_t* to char* (and then accessing the data through the char* pointer) is undefined behavior.


Yes, reading the actual data would still be UB. Hopefully will be fixed in C++29: https://github.com/cplusplus/papers/issues/592

The shortest double-to-string algorithm is basically Schubfach or, rather, it's variation Tejú Jaguá with digit output from Dragonbox. Schubfach is a beautiful algorithm: I implemented and wrote about it in https://vitaut.net/posts/2025/smallest-dtoa/. However, in terms of performance you can do much better nowadays. For example, https://github.com/vitaut/zmij does 1 instead of 2-3 costly 128x64-bit multiplications in the common case and has much more efficient digit output.

I have been using Walter Bright's libc code from Zortech-C for microcontrollers, where I care about code size more than anything else:

https://github.com/nklabs/libnklabs/blob/main/src/nkprintf_f... https://github.com/nklabs/libnklabs/blob/main/src/nkstrtod.c https://github.com/nklabs/libnklabs/blob/main/src/nkdectab.c

nkprintf_fp.c+nkdectab.c: 2494 bytes

schubfach.cc: 10K bytes.. the code is small, but there is a giant table of numbers. Also this is just dtoa, not a full printf formatter.

OTOH, the old code is not round-trip accurate.

Russ Cox should make a C version of his code..


Schubfach, Ryū, Dragonbox etc support round-tripping and shortest-width, which (it sounds like) is not important for you. The idea of round-tripping is that if you convert a double to a string and then parse that, you get the exact same value. Shortest-width is to correctly round and generate the shortest possible text. I tried to implement a version that does _just_ round-tripping but is not shortest-width, it is around 290 lines for both parsing and toString [1]

[1] https://github.com/thomasmueller/bau-lang/blob/main/src/test...


> Russ Cox should make a C version of his code.

https://github.com/rsc/fpfmt/blob/main/bench/uscalec/ftoa.c


Note that it has the same table of powers of 10: https://github.com/rsc/fpfmt/blob/main/bench/uscalec/pow10.h

It is possible to compress the table using the technique from Dragonbox (https://github.com/fmtlib/fmt/blob/8b8fccdad40decf68687ec038...) at the cost of some perf. It's on my TODO list for zmij.

I implemented Teju Jaguá in Rust, based of the original C impl https://github.com/andrepd/teju-jagua-rs. Comparing to Zmij, I do wonder how much speedup is there on the core part of the algorithm (f2^e -> f10^e) vs on the printing part of the problem (f*10^e -> decimal string)! Benchmarks on my crate show a comparable amount of time spent on each of those parts.

I don't have exact numbers but from measuring perf changes per commit it seemed that most improvements came from "printing" (e.g. switching to BCD and SIMD, branchless exponent output) and microoptimizations rather than algorithmic improvements.

What about reasonably fast but smallest code, for running on a microcontroller? Anything signifactly better in terms of compiled size (including lookups)?

If you compress the table (see my earlier comment) and use plain Schubfach then you can get really small binary size and decent perf. IIRC Dragonbox with the compressed table was ~30% slower which is a reasonable price to pay and still faster than most algorithms including Ryu.

When 30% is only ~3-6 ns it definitely seems worthwhile.

Note that ~3-6ns is on modern desktop CPUs where extra few kB matter less. On microcontrollers it will be larger in absolute terms but I would expect the relative difference to also be moderate.

It's easier to write faster code in a language with compile-time facilities such as C++ or Rust than in C. For example, doing this sort of platform-specific optimization in C is a nightmare https://github.com/vitaut/zmij/blob/91f07497a3f6e2fb3a9f999a... (likely impossible without an external pass to generate multiple lookup tables).

Other examples are CTRE (https://github.com/hanickadot/compile-time-regular-expressio...) and format string compilation (https://fmt.dev/12.0/api/#compile-api). The closest C counterpart is re2c which also requires external tooling.

It depends on the input distribution, specifically exponents. It is also possible to compress the table at the cost of additional computation using the method from Dragonbox.


I am pretty sure Dragonbox is smaller than Ryu in terms of code size because it can compress the tables.


Thank you! The simplicity is mostly thanks to Schubfach although I did simplify it a bit more. Unfortunately the paper makes it appear somewhat complex because of all the talk about generic bases and Java workarounds.


I've just started a Julia port and I think it will be even cleaner than the C version (mostly because Julia gives you a first class (U)Int128 and count leading zeros (and also better compile time programming that lets you skip on writing the first table out explicitly).


Cool, please share once it is complete.

C++ also provides countl_zero: https://en.cppreference.com/w/cpp/numeric/countl_zero.html. We currently use our own for maximum portability.

I considered computing the table at compile time (you can do it in C++ using constexpr) but decided against it not to add compile-time overhead, however small. The table never changes so I'd rather not make users pay for recomputing it every time.


Oh wow, I would love to see that if you can share it :)


Once I finish it, I'll be PRing to the Julia repo (to replace the current Ryu version), and I'll drop a link here.


I started a section to list implementations in other languages: https://github.com/vitaut/zmij?tab=readme-ov-file#other-lang.... Once yours is complete feel free to submit a PR to add it there.


Not adding it until complete, but https://github.com/JuliaLang/julia/pull/60439 is the draft.


Quick question, if you are still around :).

I have been doing some tests. Is it correct to assume that it converts 1.0 to "0.000000000000001e+15".

Is there a test suite it is passing?


It converts 1.0 to "1.e-01" which reminds me to remove the trailing decimal point =). dtoa-benchmark tests that the algorithm produces valid results on its dataset.


So if I use:

    #include "zmij.h"
    #include <stdio.h>

    int main() {
        char buf[zmij::buffer_size];
        zmij::dtoa(1.0, buf);
        puts(buf);
    }
I get `g++ zmij.cc test.c -o test && ./test` => `0.000000000000001e+15`


My bad, you are right. The small integer optimization should be switched to a different output method (or disabled since it doesn't provide much value). Thanks for catching this!


Should be fixed now.


"1.e-01" is for 0.1, not 1.0.


I assume they meant 1.e+00

That is what Schubfach does


Yeah, that's what I meant.


This is possible and the trailing zeros are indeed removed (with the exponent adjusted accordingly) in the write function. The post mentions removing trailing zeros without going into details but it's a pretty interesting topic and was recently changed to use lzcnt/bsr instead of a lookup table.


I've added a C implementation in https://github.com/vitaut/zmij/blob/main/zmij.c in case you are interested.


Nice, but it's too late I needed a different API for future use in my custom sprintf so I made mulle-dtostr (https://github.com/mulle-core/mulle-dtostr). On my machine (AMD) that benchmarked in a quick try quite a bit faster even, but I was just checking that it didn't regress too badly and didn't look at it closer.


Please note that there is some error in your port:

Error: roundtrip fail 4.9406564584124654e-324 -> '5.e-309' -> 4.9999999999999995e-309

Error: roundtrip fail 6.6302941479442929e-310 -> '6.6302941479443e-309' -> 6.6302941479442979e-309

Error: roundtrip fail -1.9153028533493997e-310 -> '-1.9153028533494e-309' -> -1.9153028533493997e-309

Error: roundtrip fail -2.5783653320086361e-312 -> '-2.57836533201e-309' -> -2.5783653320099997e-309


Good question. I am not familiar with string-to-double algorithms but maybe it's an easier problem? double-to-string is relatively complex, people even doing PhD in this area. There is also some inherent asymmetry: formatting is more common than parsing.


In implementing Rust's serde_json library, I have dealt with both string-to-double and double-to-string. Of the two, I found string-to-double was more complex.

Unlike formatting, correct parsing involves high precision arithmetic.

Example: the IEEE 754 double closest to the exact value "0.1" is 7205759403792794*2^-56, which has an exact value of A (see below). The next higher IEEE 754 double has an exact value of C (see below). Exactly halfway between these values is B=(A+C)/2.

  A=0.1000000000000000055511151231257827021181583404541015625
  B=0.100000000000000012490009027033011079765856266021728515625
  C=0.10000000000000001942890293094023945741355419158935546875
So for correctness the algorithm needs the ability to distinguish the following extremely close values, because the first is closer to A (must parse to A) whereas the second is closer to C:

  0.1000000000000000124900090270330110797658562660217285156249
  0.1000000000000000124900090270330110797658562660217285156251
The problem of "string-to-double for the special case of strings produced by a good double-to-string algorithm" might be relatively easy compared to double-to-string, but correct string-to-double for arbitrarily big inputs is harder.


I guess one aspect of it is that in really high performance fields where you're taking in lots of stringy real inputs (FIX messages coming from trading venues for example, containing prices and quantities) you would simply parse directly to a fixed point decimal format, and only accept fixed (not scientific) notation inputs. Except for trailing or leading zeros there is no normalisation to be done.

Parsing a decimal ASCII string to a decimal value already optimizes well, because you can scale each digit by it's power of 10 in parallel and just add up the result.


For those wishing to read up on this subject, an excellent starting point is this comprehensive post by one of the main contributors of the fast algorithm currently used in core:

https://old.reddit.com/r/rust/comments/omelz4/making_rust_fl...


> Unlike formatting, correct parsing involves high precision arithmetic.

Formatting also requires high precision arithmetic unless you disallow user-specified precision. That's why {fmt} still has an implementation of Dragon4 as a fallback for such silly cases.


> formatting is more common than parsing.

Is it, though? It's genuinely hard for me to tell.

There's both serialization and deserialization of data sets with, e.g., JSON including floating point numbers, implying formatting and parsing, respectively.

Source code (including unit tests etc.) with hard-coded floating point values is compiled, linted, automatically formatted again and again, implying lots of parsing.

Code I usually work with ingests a lot of floating point numbers, but whatever is calculated is seldom displayed as formatted strings and more often gets plotted on graphs.


For serialization and deserialization, when the goal is to produce strings that will be read again by a computer, I consider the use of decimal numbers as a serious mistake.

The conversion to string should produce a hexadecimal floating-point number (e.g. with the "a" or "A" printf conversion specifier of recent C library implementations), not a decimal number, so that both serialization and deserialization are trivial and they cannot introduce any errors.

Even if a human inspects the strings produced in this way, comparing numbers to see which is greater or less and examining the order of magnitude can be done as easy as with decimal numbers. Nobody will want to do exact arithmetic computations mentally with such numbers.


Think about things like logging and all the uses of printf which are not parsed back. But I agree that parsing is extremely common, just not the same level.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: