Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What's wrong with:

    *p += sprintf(*p, "hello");
    *p += sprintf(*p, "world");


Well, that should be `snprintf()` to start with, but even with that, there are issues. The return type of `snprintf()` is `int`, so it can return a negative value if there was some error, so you have to check for that case. That out of the way, a positive return value is (and I'm quoting from the man page on my system) "[i]f the output was truncated due to this limit then the return value is the number of characters which would have been written to the final string if enough space had been available." So to safely use `snprintf()` the code would look something like:

    int size = snprintf(NULL,0,"some format string blah blah ...");
    if (size < 0) error();
    if (size == INT_MAX)
      error(); // because we need one more byte to store the NUL byte
    size++;
    char *p = malloc(size);
    if (p == NULL)
      error();
    int newsize = snprintf(p,size,"some format string blah blabh ... ");
    if (newsize < 0) error();
    if (newsize > size)
    {
      // ... um ... we still got truncated?
    }
Yes, using NULL with `snprintf()` if the size is 0 is allowed by C99 (I just checked the spec).

One thing I've noticed about the C standard library is that is seems adverse to functions allocating memory (outside of `malloc()`, `calloc()` and `realloc()`). I wonder if this has something to do with embedded systems?


Not just embedded systems, also OSes. C's standard library should generally work without the existence of a heap. After all, you have to create the heap using C before you can allocate from it.


malloc is a required part of ISO C, though.


Functions like malloc are only required for hosted implementations. Many operating systems are built using freestanding implementations.

Further, on many platforms, one should avoid using malloc() unless portability is more important than performance or safety. Some operating systems support useful features like the ability to allocate objects with different expected lifetimes in different heaps, so as to help avoid fragmentation, or arrange to have allocations a program can survive without fail while there is still enough memory to handle critical allocations. Any library that insists upon using "malloc()" will be less than ideal for use with any such operating system.


Also, the return type being int means that there's a limit to the length of your string…


Perhaps you meant snprintf. But snprintf can fail on allocation failure, fail if the buffer size is > INT_MAX, and in general isn't very light weight--last time I checked glibc, snprintf was a thin wrapper around the printf machinery and is not for the faint of heart--e.g. initializing a proxy FILE object, lots of malloc interspersed with attempts to avoid malloc by using alloca.

It can also fail on bad format specifiers--not directly irrelevant here except that it forces snprintf to have a signed return value, and mixing signed (the return value) and unsigned (the size limit parameter) types is usually bad hygiene, especially in interfaces intended to obviate buffer overflows.


That could lead to buffer overflow.����


When I wrote that, I had in mind the observation about continued recalculation of buffer len. My suggestion has no such thing. It looks so good that I imagine this was probably how it was intended to be used. With that in mind, isn't it the user's job to know the size of the buffers he's using? Doesn't expecting that the function know about buffer size go against the single responsibility principle?

I'm new to C, in case you couldn't tell.


The problem in practice is that you do not write “hello” and “world” to the destination buffer. You write data that is computed more or less directly from user inputs. Often a malicious user.

So the user only needs to find a way to make the data longer than the developer expected. This may be very simple: the developer may have written a screensaver to accept 20 characters for a password, because who has a longer password than this? Everyone knows that only the first 8 characters matter anyway. (This may have been literally true a long time ago, I think, although it's terrible design. Anyway only 8 characters of hash were stored, so in a sense characters after the first 8 did not buy you as much security as the first 8, even if it was not literally true.)

And this is how there were screensavers that, when you input ~500 characters into the password field, would simply crash and leave the applications they were hiding visible and ready for user input. This is an actual security bug that has happened in actual Unix screensavers. The screensavers were written in C.

And long story short, we have been having the exact same problem approximately once a week for the last 25 years. Many people agree that it is urgent to finally fix this, especially as the consequences are getting worse and worse as computers are more connected.

One solution that some favor is functions that make it easier not to overflow buffers because you tell them the size of the buffer instead of trying to guess in advance how much is enough for all possible data that may be written in the buffer. This is the thing being discussed in this thread. The function sprintf is not a contender in this discussion. The function snprintf could be, if used wisely, but it is a bit unwieldy and the OP's proposal has a specific advantage: you compute the end pointer only once, because this is the invariant.


An analogous seprintf() would probably be a good thing to add too, where the buffer end is passed in instead of a buffer length. I would still have it return a pointer to the end of what was copied. Anyone can calculate the length if they need to, by subtracting the original pointer from the returned pointer.

    char *seprintf(char *str, char *end, const char *format, ...);


I think sprintf and gets can be perfectly secure interfaces. The standard just needs to specify them in a way that causes overflows to raise signals. This is probably more for POSIX and UNIX, since I think it requires the concept of memory mappings. For example:

Start by specifying that memcpy goes by increasing address. This can be done by specifying that no pages to be written by memcpy can be written to until after all pages with lower addresses have been accessed by memcpy. (it is OK to read forwards and then write backwards; the first access must not skip pages)

Next, specify sprintf and gets in terms of memcpy. The output is written as if by memcpy.

The user may then place a PROT_NONE page of memory after the buffer. Since the pages are being accessed by address order, the PROT_NONE page will safely stop the buffer overflow. The user can have a signal handler deal with the problem. It can exit or map in more memory. If we require sprintf and gets to be async-signal-safe, then the signal handler can also siglongjmp out of the problem.


Surely you don’t expect every stack buffer to have a hard page placed after it to protect from overflows?


> With that in mind, isn't it the user's job to know the size of the buffers he's using?

Yes. The user knows the size of his buffer, and then passes that knowledge on to the string constructing functions so that they do not overflow the buffer.

> Doesn't expecting that the function know about buffer size go against the single responsibility principle?

What's single responsibility again? "Execute this one assembly instruction"?

What you want from standard library functions is, usually, "construct a string into this buffer (whose size is N)."


It looks like you'd be dereferencing the pointer p, but you'd also need to make sure that what p points to has enough memory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: