I use timers, saving the results in memory and dumping them occasionally then ag...

I use timers, saving the results in memory and dumping them occasionally then aggregating. Each timer has a category name (like sql query, or one of the many higher-level tasks in the app) plus optional argument (so the actual SQL query, or the thing that had some work performed on it). Each timer just saves min-max-avg time. In the aggregate view you can see that e.g. "foo'ing" took 20% above average Foo time this week after a new software release and can check that out.

For profiling, I've occasionally swapped out some key function with odd characteristics with one runs in a profile. So you can create a new Profiler instance, and patch your weird function to instead call profiler.runcall(func). After you've tested it a few times, you swap back to the old version and dump the stats.

If you have multiple proxying layers, multiple processes etc. a useful option is to make e.g. Apache save the request time in the main or another log file. This is the CustomLog %D option. Then you can easily find out that e.g. /foo/bar requests take so and so much on average, with 95% below this number, perhaps even doing real-time warning when they start to increase.