> That shared_ptr<bar> instance is held either on the stack (with address FP + o...

worstspotgain · on July 9, 2024

> No matter what the instructions say, with all their fancy addressing modes, foo has to load two cache lines: one holding the shared_ptr and another holding the pointee data. If we instead passed bar* in a register, we'd need to grab only one cache line: the pointee's.

The cache doesn't make a difference here. To clarify: we start with a shared_ptr<bar> instance. It must get dereferenced to be used. It must either be dereferenced by the caller (the const bar & contract) or by the callee (the const shared_ptr<bar> & contract).

If the caller dereferences it, it might turn out to be superfluous if the callee wasn't actually going to use it. In this case const shared_ptr<bar> & is more efficient.

However, if the caller happened to have already dereferenced it prior to the call, one dereferencing would be avoided. In this case const bar & is more efficient.

> Sure. Maybe the caller already has a fully formed shared_ptr around somewhere but not in cache.

This is where our misunderstanding is. The caller starts out by only having a shared_ptr. Someone (caller or callee) has to dig the bar * out.

quotemstr · on July 9, 2024

Sure. I've just only rarely encountered the situation you're describing. When it comes up, passing a shared_ptr by const reference is fine --- just so long as it's documented and the code explains "Look, I know this looks like a newbie error, but in context, it's actually an optimization. Here's why..." lest someone "optimize" away the deferred load win.