I completely agree with you.
For work I'm writing a performance critical application, and using gprof/valgrind for profiling has been very useful. Not only for determining how time is spent, but also where it is _not_ spent. Some pieces of code can be implemented in a fast and stupid way, without impacting overall performance.
That said, inlining is precisely an area where I find profiling to be difficult. It's difficult to see where in a function time is spent if a lot of function-calls in that function are inlined. However, I'm not certain that the every part of my code is equally slower/faster depending on what optimizations I turn on/off in the compiler. Thus, reducing optimizations might give me the wrong impression about which functions take up the most runtime.
That said, inlining is precisely an area where I find profiling to be difficult. It's difficult to see where in a function time is spent if a lot of function-calls in that function are inlined. However, I'm not certain that the every part of my code is equally slower/faster depending on what optimizations I turn on/off in the compiler. Thus, reducing optimizations might give me the wrong impression about which functions take up the most runtime.