> That shared_ptr<bar> instance is held either on the stack (with address FP + offset or SP + offset) or inside another object (typically 'this' + offset.) To call foo(const shared_ptr<bar> &), the compiler adds the base pointer and offset together, then passes the result of that addition - without dereferencing it.
You're overthinking it. Think in cache lines. No matter what the instructions say, with all their fancy addressing modes, foo has to load two cache lines: one holding the shared_ptr and another holding the pointee data. If we instead passed bar* in a register, we'd need to grab only one cache line: the pointee's.
Sure. Maybe the caller already has a fully formed shared_ptr around somewhere but not in cache. Maybe foo often doesn't access the pointee. But how often does this situation arise?
> No matter what the instructions say, with all their fancy addressing modes, foo has to load two cache lines: one holding the shared_ptr and another holding the pointee data. If we instead passed bar* in a register, we'd need to grab only one cache line: the pointee's.
The cache doesn't make a difference here. To clarify: we start with a shared_ptr<bar> instance. It must get dereferenced to be used. It must either be dereferenced by the caller (the const bar & contract) or by the callee (the const shared_ptr<bar> & contract).
If the caller dereferences it, it might turn out to be superfluous if the callee wasn't actually going to use it. In this case const shared_ptr<bar> & is more efficient.
However, if the caller happened to have already dereferenced it prior to the call, one dereferencing would be avoided. In this case const bar & is more efficient.
> Sure. Maybe the caller already has a fully formed shared_ptr around somewhere but not in cache.
This is where our misunderstanding is. The caller starts out by only having a shared_ptr. Someone (caller or callee) has to dig the bar * out.
Sure. I've just only rarely encountered the situation you're describing. When it comes up, passing a shared_ptr by const reference is fine --- just so long as it's documented and the code explains "Look, I know this looks like a newbie error, but in context, it's actually an optimization. Here's why..." lest someone "optimize" away the deferred load win.
You're overthinking it. Think in cache lines. No matter what the instructions say, with all their fancy addressing modes, foo has to load two cache lines: one holding the shared_ptr and another holding the pointee data. If we instead passed bar* in a register, we'd need to grab only one cache line: the pointee's.
Sure. Maybe the caller already has a fully formed shared_ptr around somewhere but not in cache. Maybe foo often doesn't access the pointee. But how often does this situation arise?