How the Boehm Garbage Collector Works

__alexs · on Feb 10, 2012

Does anyone have data on how much it leaks in practice? Also, could maliciously crafted input result in a DoS vector for Boehm using applications?

neild · on Feb 10, 2012

I once worked on a project that used the Boehms collector for a long-running server process.

About once a year, we'd lose a week or two to tracking down a memory issue. While the root cause was never a GC bug, the GC would make it dramatically more difficult to figure out what the real problem was.

The closest we had to an actual GC bug was a leak involving a data structure containing a list of IP addresses that was incorrectly interpreted as containing pointers. It took forever to figure out what had happened, since the issue only appeared if you worked with a specific set of IP addresses--and even after we had a replication case, the GC made it extremely difficult to determine why memory was growing rapidly. The fix was simple once we identified the precise problem--flag that memory as "atomic" (i.e., guaranteed not to contain pointers)--but finding the precise data structure that was holding memory live was a nightmare.

We eventually ran into a slow memory growth issue that we couldn't figure out. After much debugging, one developer became fed up with the situation and tore the GC out. It took him about two weeks to produce a version of the codebase that leaked less than the GC-enabled one, and another couple weeks to eliminate virtually all memory leaks. We continued to find minor leaks for another several months, none of which were difficult to correct. (We used a debugging malloc library that flagged leaked memory on exit, so we always knew exactly where the leak had originated.)

Not only did the GC-less version leak less, it used about 40% less memory.

I would strongly recommend against using the Boehm collector for any long-running processes. The tradeoffs may be more acceptable in short-term processes where slow leakage over time is unimportant.

ghusbands · on Feb 10, 2012

My story is similar to nield's; I worked on a project that used the Boehm-Weiser collector in a 32-bit process.

Once or twice a tear, we'd have an unbounded memory growth to trace. Most of the time, we were able to solve it by providing layouts for structures that mixed pointers and pointer-like data. The debug facility it provides for providing a random backtrace to a root was helpful in those cases.

However, more than once we had a leak that it was beyond our ability to trace, despite a significant amount of diagnostic work. We found some workarounds that were specific to our app, which helped.

In the end, the costs to us were high enough that we rewrote our codebase in C# (our software was Windows-only). It was pretty much a line-for-line conversion, but it worked and we no longer had any memory leaks (except for one, in .NET timers, that took only minutes to trace using readily-available diagnostic tools).

I do believe a 64-bit process using the Boehm collector would be much less likely to suffer these issues, but this was before x64 was prevalent in corporate server rooms.

On your second question: Crafted input can indeed result in significant memory retention, if you aren't careful to exclude user input from the GC and have large, cyclic structures to collect, on a 32-bit machine, with almost any text encoding.

stavros · on Feb 11, 2012

Correct me if I'm wrong, but the GC isn't the only thing that frees memory, is it? You can still free it manually (and are supposed to), so those leaks would have happened without a GC too, and it can only help things.

I'm getting the impression that posts here imply you're better off without a GC, I'm not sure if that's the intention, but it strikes me as wrong.

ghusbands · on Feb 11, 2012

Your parent post spoke about going from one GC to another and having the problems disappear. Hence, it only implies that changing GC can have a useful effect.

Your message does betray some confusion about garbage collection. The basic principle of GC is that you do not deallocate RAM manually (and, under most GCs, cannot), and programs written under it expect deallocation to happen automatically. You are not 'supposed to' free manually, as that would be wasted effort, and some standard configurations of the Boehm collector actually ignore all calls to free().

There is a very useful description of garbage collection at http://blogs.msdn.com/b/oldnewthing/archive/2010/08/09/10047... which includes and expands on the useful mental model that "garbage collection is simulating a computer with an infinite amount of memory".

stephenjudkins · on Feb 10, 2012

This will vary between 32-bit and 64-bit environments. My understanding is that the vastly larger address in 64-bit environments greatly mitigates the chance that data will be mistaken as pointers.

As far as crafting input to match allocated addresses, that is an interesting idea. One could, at the very least, create data that lies within the heap's address range. The issue is, you'd need a lot of data to blacklist a significant amount of memory, and if that's the case why not just DoS with a ton of data?

In practice, a fragmented heap is probably a much bigger issue than actual leaks. As the article points out, a conservative GC cannot perform compaction.

aaronblohowiak · on Feb 10, 2012

I think the key thing that prevents the DOS is that you have to have a 64 bit value that points to the beginning of the allocated block of memory in order for that memory to be kept around -- so, if you malloc 16 bytes are are returned address 100, then a value that contains 101 will not confuse the GC. This has implications if you do some wonky things with only keeping pointers to internal structs and use pointer math to access their containers, but you probably shouldn't do that anyway =D

barrkel · on Feb 10, 2012

Boehm GC has configurable parameters in a header file; you can tell it that interior pointers should be supported.

Interior pointers can be a perfectly reasonable thing to have. For example, many, if not most, implementations of multiple inheritance rely on different values for the 'this' pointer depending on the type of 'this', where that type is somewhere up the inheritance graph. For a class C : public A, public B {}, the physical value of A * = new C and B * = new C will usually be different. The way this is often implemented is by having the addresses of the various vtables for different ancestors stored in the object data, and the conversion from the descendant class to one of the ancestor class returns an interior pointer to one of these vtable locations inside the object data.

aaronblohowiak · on Feb 10, 2012

You are right that I should have added by default.

>The way this is often implemented is by having the addresses of the various vtables for different ancestors stored in the object data, and the conversion from the descendant class to one of the ancestor class returns an interior pointer to one of these vtable locations inside the object data.

Suggests that a pointer value would still exist that would reference the beginning of the allocation block, stored in the vtable so it would still avoid erroneous collection.

Edit: but you are right that there are some valid reasons to have an interior pointer, which is why the GC has the friendly macros.

barrkel · on Feb 10, 2012

No; the vtable normally contains a read-only list of member function pointers, shared across all objects of that type. Objects need to be kept alive solely through the interior pointer to the (compiler-implemented) field (itself a pointer to a vtable) inside the object.

C++ terms:

    class C : public A, public B {};

Typical class layout, in C terms:

    struct A
    {
        // exists assuming virtual methods on A
        // the data this vtable points to is read-only but C++ ctor semantics requires
        // updating vtable pointer during inherited constructor calls
        A_vtable *vtable; 
        // A data
    };
    struct B
    {
        B_vtable *vtable;
    };
    struct C
    {
        C_vtable *vtable; // incorporates A's vtable
        // struct A a; data members of A but without vtable
        struct B b;
    };

A pointer to an instance of C, but typed as B * , will point at the structure contained in C; it will be an interior pointer. There's no reason to expect that there will be another live pointer of type C * pointing to the object; a pointer of type C * needs to be sufficient. Typecasting back down the hierarchy to C * again (with dynamic_cast) will need to adjust the pointer to the start of the object; this normally requires RTTI, and may be done by e.g. storing the offset of the vtable inside the object in the vtable itself (perhaps at a negative offset).

The above is all implementation dependent, of course, but what I've described above maps almost exactly to how Delphi implements COM interfaces (where A would be the base class and B would be a pure virtual class, or interface - no data members).

aaronblohowiak · on Feb 11, 2012

Ah, thank you for taking the time to point that out to me. Very interesting and informative!

dfox · on Feb 10, 2012

On most binary builds of GC i have seen in the wild, interior pointers are enabled (and at least since GC 6.0, this is the default in GC's Makefile)

gchpaco · on Feb 10, 2012

Storing binary data in memory and not explaining what it is to the Boehm GC with one of the macros they supply is the easiest way to fall down. Since "fairly arbitrary" binary data can be encoded as UTF-8 strings (many 32-bit binary codes cannot be a part of a UTF-8 string, but most can) you need to mark strings as non-pointer data too, which people are not always good about doing. If you do things correctly (that is, mark your blobs of arbitrary data properly) then it's reasonably good but of course most people don't and it runs counter to the idea that Boehm is a drop-in replacement.

This isn't Boehm's fault, specifically; any conservative GC would have similar trouble. You can't get completely away from cooperating with the GC and have good GC both.

ghusbands · on Feb 10, 2012

I would certainly hope that nobody is blaming Boehm. He's provided a valuable tool that has some practical limitations, which are probably mostly due to the nature of conservative garbage collection.

However, it's worth noting that you can't easily mark/annotate all data structures, especially when large third-party libraries are involved, and it's not guaranteed to solve the problem. Implying otherwise might dangerously mislead people.

As an example, look at Mono, which gives the GC very precise information [1]. Even so, at least some projects suffer ever-increasing memory usage when run under it [2], which the Mono team are fixing by writing their own garbage collector.

[1] http://www.mono-project.com/Mono:Runtime#Garbage_Collection [2] http://www.mono-project.com/FAQ:_ASP.NET#Memory_Usage

abecedarius · on Feb 10, 2012

I used it in a Scheme interpreter with no noticeable problem as my main personal hacking platform for years. OTOH, I didn't use it for long-lived servers, and Racket eventually migrated from it to a precise collector and IIRC the release notes mentioned occasional such problems as a factor. If they said how often or how severe, I missed it.

sambeau · on Feb 10, 2012

I have found that it more likely holds on to some pointer longer than it could rather than leaking memory. For instance, I've never known it to start eating up memory.

lilyball · on Feb 10, 2012

Holding onto a pointer that should be freed is leaking memory.

ghusbands · on Feb 10, 2012

That is overly conservative. A perfect garbage collector that always runs in reasonable time is provably impossible. What sambeau said is that it often holds on to [an allocation] for too long, which is true to varying extent of all garbage collectors.

To leak memory is to fail to deallocate it at all.

gchpaco · on Feb 11, 2012

"Leak" may be the right word for some configurations with conservative GC. A "leak-free" GC would presumably guarantee eventual collection of all dead objects, which is impossible with a conservative GC in theory, and in practice it's not especially difficult to get into a configuration with the Boehm GC where the lifetime of a dead object is infinite (usually because it doesn't move and some binary data that the GC thinks might be a pointer doesn't go away, because it's texture data or something).

sambeau · on Feb 11, 2012

And of course there are many valid programs where leaking memory is totally fine (and more performant): shell tools, cgis etc.

When I write any of these, the only time I free anything is when I've had to malloc something huge.

cpeterso · on Feb 11, 2012

Interestin dichotomy: In a managed system, a "leak" is when a pointer points to a memory allocation too long. In an unmanaged system, a "leak" is when you have no pointers to a memory allocation. :)

loeg · on Feb 10, 2012

One of the biggest problems with this style of "scan all memory" GC is that you have to stop all threads (flush all pointers to memory) to perform a collection. Pointers held in registers in other active threads are inaccessible to the GC thread.

Failing to stop the world (or ensure all pointers are in memory rather than registers) can result in live objects being collected. This happened to Ruby at one point: http://timetobleed.com/the-broken-promises-of-mrireeyarv/

Edit: This boils down to breaking the compiler's model of memory as specified by the C standard; all bets are off.

cbsmith · on Feb 14, 2012

Um no. First, it doesn't "scan all memory". It scans the root and fans out. You can do read/write barriers and such to avoid blocking GC. It doesn't matter what value registers have in them. What matters is what is on the stack and what is reachable from said stack. If you have control of malloc, you can be far more clever than what you are describing. Ruby had a crummy implementation of GC, but you'll notice, for example, that most JVM's do "scan all memory" without forcing all active threads to be blocked at once.

amatus · on Feb 10, 2012

This would fail badly on a http://en.wikipedia.org/wiki/XOR_linked_list

kmm · on Feb 11, 2012

You really shouldn't be using a XOR linked list anyway. It has a lot of disadvantages and none of them weigh up to the only advantage of a lower memory footprint per list item.

reitblatt · on Feb 11, 2012

The point is that this garbage collector is unsound. It may collect memory that is still in use. That's bad.

reitblatt · on Feb 11, 2012

To be clearer: safety is no longer compositional in this system. To safely use the Boehm garbage collector, it's not sufficient that my own code doesn't use "disguised" pointers: I have to verify that my libraries don't either. That's impractical and contrary to good software engineering at best, impossible at worst.

rayiner · on Feb 11, 2012

No, you have to verify that your libraries don't keep disguised pointers to objects you allocated via the GC, that you don't yourself keep pointers to. It's quite unusual in C to give a reference to an object to another piece of code then drop all the pointers to it that you hold, because the usual discipline is that the module that allocates an object frees it.

reitblatt · on Feb 11, 2012

Not if you build the GC with --enable-redirect-malloc to use with existing libraries/code bases. Which should be fine because, after all, this is a "conservative" garbage collector. Right?

silon4 · on Feb 11, 2012

If RAM is precious, you shouldn't use GC anyway. It typically has 2x memory use overhead to reduce GC performance cost.

blibble · on Feb 12, 2012

I don't see how it would be 2x: a minimal GC would just require an additional pointer per allocation (in the worst case) for the GC roots.

malloc() has overhead too, typically immediately before the pointer it returns there's data about the allocation, such as the size of the alloc, magic for detecting bad free()'s, etc (though I guess a simple slab allocator would just need a single bit for each allocation, plus a pointer per slab).

kmm · on Feb 11, 2012

I've often wondered how a garbage collector can scan the entire memory in a reasonable time. The article doesn't explain that particular operation. With a quick look at the source code I couldn't figure it out either.

Can anyone enlighten me? Is it only the stack that is checked because that seems possible in a reasonable time.

cpeterso · on Feb 11, 2012

The GC doesn't have to scan the entire address space. It starts scanning with GC roots, memory regions known to contain pointers such as stack, heap, and global variables (registered with the GC).

loeg · on Feb 11, 2012

Boehm controls malloc; it knows where the heap is. In general, yes, the more memory you have allocated, the slower BoehmGC will be.

It checks both stack and heap.

vilda · on Feb 10, 2012

Neither in the article nor it the paper they mention cycles in references. Does it solve it in any way?

chrisb · on Feb 10, 2012

Garbage collectors have no problem with reference cycles.

A GC works by finding all objects that are reachable (referenced by some other object or a root), then discarding all other objects. So a cyclic data structure is completely discarded as soon as there are no live references to any of its objects.

As usual, Wikipedia has lots more information: http://en.wikipedia.org/wiki/Garbage_collection_(computer_sc...

GC roots are generally things like: static/global variables, active stack frames (function parameters and local variable), CPU registers.

rayiner · on Feb 10, 2012

BDW GC is a tracing collector. It doesn't need to have any special handle of cycles--either a cycle is reachable from the root set, in which case it is marked and considered alive, or it is not, and is never even seen by the collector before being reclaimed.