> PGO compiles only 0.4% of our application for speed, and our response times are about 20 usecs slower with it on
This is a complex issue, but consider abandoning PGO and just compiling for for speed then. PGO doesn't help in each and every case.
> RE xperf (and WinDbg): Stop bundling this shit in "Toolkits".
How else would you bundle it? It's not simple to put something as part of the base OS image. And I haven't heard about these installers breaking the VS installer - that sounds like a bad bug.
> __assume is so useless. How often does someone write a branch that does nothing every time?
It's commonly used as a retail version of a debug ASSERT macro. But yes, like I said earlier - I wish we would do more with static annotations, but I've gotten push back.
> PogoAutoSweep crashes threaded programs if you don't suspend every other thread but it's still quasi documented. The PogoSafeMode build flag/environment appears to be ignored.
I've never seen PogoAutoSweep crash - do you have a repro? PogoSafeMode doesn't affect PogoAutoSweep, only probe generation.
> The filename postfix that PogoAutoSweep adds breaks the VS2012 PGO menu options.
Haven't heard of this either, but stay tuned. I don't like the PGO menu options as they currently stand.
> There's nothing one can do to limit the VS2012 profiler to specific threads.
I can forward that request to the profiler team.
> The interface for instrumenting specific functions is terrible, use a plain text file or decl_spec FFS.
Are you talking about PGI or an instrumented profiler?
> If there are #defines or other ways to detect an instrumented build, they're terribly documented.
There isn't an easy way, and having different code in the PGI build versus the PGU build would be problematic.
> PGO instrumentation/optimization is woefully obtuse. What did it pick for speed? Why did it pick it? What branches did it fold/unfold?
Stay tuned
> How does the pgc weighting actually work?
The obvious way, the counts are multiplied by the provided factor before being merged in the PGD.
> Can I artificially create my own pgc?
Not realistically.
> Not related to our main response loop, but we can see in our logging threads that the LFH malloc appears to often call RtlAnsiStringToUnicodestring.
No idea (CRT owns malloc, Windows owns LFH).
> Speaking of which, changing the malloc implementation is still horrible even after the VS2010 msvcrt changes. In linux...
I'm not an expert, but my understanding was that malloc and friends were weak symbols, and if you just linked in an obj that defined malloc it would be selected as the "real" malloc without giving an ODR.
> Why is there SemaphoreSlim in C# but not C++? Why is there no Benaphore primitive that can also be used in WaitForMultipleObjects?
I'm not sure, Windows owns this.
> Serious issues in Microsoft Developer Connect are often ignored, closed as behaves as expected, or dismissed off hand
I've heard complaints about MSConnect before as well. All I can say is that it is the correct place to file bugs; and the issues there do directly show up in our bug list (someone goes through connect issues, filters/combines them, and files bugs).
> There is still no valgrind/cachegrind equivalent that provides the same level of detail
That is correct. Sorry.
> Our statically linked application takes 20 minutes link and the link is not parallel. C++ compiles are likewise brutally slow
Link.exe performance is at the top of our minds right now, you're not the only one to bring it up. VS 2013 will have some perf improvements across the FE (to help with C++ being brutally slow) but there is always more to do.
> And no, I'm not going to turn on precompiled headers, MSVC builds incorrect binaries about 5% of the time as it is.
Never heard that before - codegen bugs are always deadly serious and treated with high priority. If you have a repro, please share it.
This is a complex issue, but consider abandoning PGO and just compiling for for speed then. PGO doesn't help in each and every case.
> RE xperf (and WinDbg): Stop bundling this shit in "Toolkits".
How else would you bundle it? It's not simple to put something as part of the base OS image. And I haven't heard about these installers breaking the VS installer - that sounds like a bad bug.
> __assume is so useless. How often does someone write a branch that does nothing every time?
It's commonly used as a retail version of a debug ASSERT macro. But yes, like I said earlier - I wish we would do more with static annotations, but I've gotten push back.
> PogoAutoSweep crashes threaded programs if you don't suspend every other thread but it's still quasi documented. The PogoSafeMode build flag/environment appears to be ignored.
I've never seen PogoAutoSweep crash - do you have a repro? PogoSafeMode doesn't affect PogoAutoSweep, only probe generation.
> The filename postfix that PogoAutoSweep adds breaks the VS2012 PGO menu options.
Haven't heard of this either, but stay tuned. I don't like the PGO menu options as they currently stand.
> There's nothing one can do to limit the VS2012 profiler to specific threads.
I can forward that request to the profiler team.
> The interface for instrumenting specific functions is terrible, use a plain text file or decl_spec FFS.
Are you talking about PGI or an instrumented profiler?
> If there are #defines or other ways to detect an instrumented build, they're terribly documented.
There isn't an easy way, and having different code in the PGI build versus the PGU build would be problematic.
> PGO instrumentation/optimization is woefully obtuse. What did it pick for speed? Why did it pick it? What branches did it fold/unfold?
Stay tuned
> How does the pgc weighting actually work?
The obvious way, the counts are multiplied by the provided factor before being merged in the PGD.
> Can I artificially create my own pgc?
Not realistically.
> Not related to our main response loop, but we can see in our logging threads that the LFH malloc appears to often call RtlAnsiStringToUnicodestring.
No idea (CRT owns malloc, Windows owns LFH).
> Speaking of which, changing the malloc implementation is still horrible even after the VS2010 msvcrt changes. In linux...
I'm not an expert, but my understanding was that malloc and friends were weak symbols, and if you just linked in an obj that defined malloc it would be selected as the "real" malloc without giving an ODR.
> Why is there SemaphoreSlim in C# but not C++? Why is there no Benaphore primitive that can also be used in WaitForMultipleObjects?
I'm not sure, Windows owns this.
> Serious issues in Microsoft Developer Connect are often ignored, closed as behaves as expected, or dismissed off hand
I've heard complaints about MSConnect before as well. All I can say is that it is the correct place to file bugs; and the issues there do directly show up in our bug list (someone goes through connect issues, filters/combines them, and files bugs).
> There is still no valgrind/cachegrind equivalent that provides the same level of detail
That is correct. Sorry.
> Our statically linked application takes 20 minutes link and the link is not parallel. C++ compiles are likewise brutally slow
Link.exe performance is at the top of our minds right now, you're not the only one to bring it up. VS 2013 will have some perf improvements across the FE (to help with C++ being brutally slow) but there is always more to do.
> And no, I'm not going to turn on precompiled headers, MSVC builds incorrect binaries about 5% of the time as it is.
Never heard that before - codegen bugs are always deadly serious and treated with high priority. If you have a repro, please share it.