Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Pascal did/does this, but eventually someone wants a string longer than the size portion can handle. Or wants the number of characters not the number of bytes.
 help



I wasn't a programmer in these days, so I don't know if there's some other major concern that would kill this, but I sometimes wonder about whether we could have / should have used variable-length integers. That is, something like, 0-127 byte strings get their length prefixed, 128 - 16383 get two bytes of prefix, and the probably-rare 16384 - 2097151 strings would end up with three, though proportionally by that point it's hardly anything. Or you could use the UTF-8 mechanism for packing the bytes, though that costs more and probably doesn't get anything we'd care about in the 1980s or 1990s.

It's a bit of extra code, yes. Not necessarily all that much, but some. On average it is only slightly more expensive than null termination, and considered as a proportion of the size of the strings themselves it's hardly anything. It's probably better than the strings getting hard-limited to 0-255, though, which was quite frequently a user-visible quirk.


You could start the encoding with two bytes, so that if the most significant bit of the first byte is 0, the length is that byte plus another. That gives you 32KiB strings with just a byte more. Short strings might suffer, but I think the overhead is reasonable.

The next level (110x xxxx) would give you 8MiB strings, which are going to be fine for most things.


32-bit int isn't too much overhead. Just 3 additional bytes. I bet it's almost always better than c style strings. In the vast majority of situations the price isn't that bad, considering you make strings much more secure and potentially faster in string manipulations.

32-bit is so little overhead that we don't blink at adding 64 to our strings nowadays, because of the benefits we get from alignment.

But remember the first Macintosh shipped with 128KB of RAM, 131,072 bytes. Three more bytes per string hurts a lot more there...

... although, that said, even in that era given the number of errors that null-terminated strings caused, even completely ignoring security, I do still wonder if at least defaulting to 2 bytes of length and doing something special for strings over 64K still wouldn't have been the right tradeoff, even in the case of short strings. Today we mostly focus on security, but null-terminated strings also caused a lot of just plain-old bugs. But so did 1-byte length strings... it's way too easy to run out of 256 characters even on those old systems.


Dude, every sane language out there does this. Just generally with 4byte prefix. Null-terminated stuff has always been backwards compat stuff.

Pascal strings - historically and why people even remember this being an issue - were up to 255 chars in size, if not you had to use different string type.

You might still want raw pointers for all sorts of low level stuff, but you almost never want to have null-terminated strings for anything but back-compat, one of the worst things ever, even on memory constrained systems.


And then anyone that isn't stuck in 1976 will use open arrays.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: