Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Former HFT dev here. Know fundamentals: sources of performance issues = things that eat/waste CPU cycles, things that reach too far down the memory hierarchy. Usually the latter. E.g. L2 cache to RAM - order of magnitude slower; RAM to disk: 4+ orders of magnitude slower.

Things that eat CPU: iterations, string operations. Things that waste CPU: lock contentions in multi-threaded environments, wait states.

You can usually build a lot of the understanding from first principles starting there. Back in the day we had to do this because there wasn't much by way of readily available literature on the subject. Actual techniques will depend or evolve based on your choice of platform or version.

E.g. 20 years ago, we used to create object pools in C++ at load time to avoid Unix heap locks at runtime. This may no longer be necessary. 15(ish?) years ago, JNI was used when the JVM wasn't fast enough for certain stuff. This is no longer necessary. 10 years ago, immutable JS objects were thought to be faster because the JS runtimes at the time were slower to mutate existing objects than to create new ones. This too, may no longer be true (I haven't checked recently). Until very recently, re-rendering with virtual DOM diffing was considered more performant than direct, incremental DOM manipulation. This too, may no longer be true.



> Until very recently, re-rendering with virtual DOM diffing was considered more performant than direct, incremental DOM manipulation. This too, may no longer be true.

Actually wasn't strictly true even when React came out, but it was true enough with the code that most JS developers actually wrote to lead to a change in dominant JS framework.

DOM manipulation even in 2013 used a dirty-bit system. Calling element.appendChild would be a few pointer swaps and take a couple ns. However, if you then called any of a number of methods that forced a layout, it would re-render the whole page at a cost of ~20ms on mobile devices of the day. These included such common methods as getComputedStyle(), .offsetWidth, .offsetHeight, and many others - there was a list of about 2 dozen. Most JS apps of the day might have dozens to hundreds of these re-layouts triggered per frame, but the frame budget is only 16.667ms, so that's why you had slow animations & responsiveness for mobile web apps of 2013.

React didn't need a full virtual DOM layer. It just needed to ensure that all modifications to the DOM happened at once, and no user code ran in-between DOM manipulations within a certain frame. And sure enough, there are frameworks that actually do this with a much lighter virtual DOM abstraction (see: Preact) and get equal or better performance than React.

The lesson for performance tuning is to understand what's going on, don't just take benchmarks at face value. If a call is expensive, sometimes it's conditionally expensive based on other stuff you're doing, and there's a happy path that's much faster. Learn to leverage the happy paths and minimize the need to do expensive work over and over again, even if the expensive work happens in a layer you don't have access to.


Addendum: never forget Amdahl's Law. And never forget Knuth's full quote:

“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” - Donald E. Knuth, Structured Programming With Go To Statements


The problem is, instead of internalizing the whole quote, which is valid in my opinion, many have only internalized the "root of all evil" part. This roughly translates to "my feature is done - performance is someone else's problem".


Nowadays, the other 97 is slow too.


This quote has done more damage to software usability than any other idea of the last 60 years. It has lead us to a state where software is unbearably slow because nothing was worth optimizing. Just like it's easy to go broke with thousands of minor expenses, it's easy to create sluggish software that has no obvious bottleneck because the whole damn thing is slow.


This ideology is exactly why we have bloated abominations like Slack electron apps.


Do you guys still buy the beefiest Intel Xeons in order to fit your main application + OS entirely within the L3 cache? There was a CppCon talk from a HFT dev about this 10 years ago.


Do you happen to know what's the CppCon talk called ?


It was this one: https://www.youtube.com/watch?v=NH1Tta7purM

Not 10 years, merely 6. I was mistaken and thought it was posted in 2015.


Can you suggest/recommend some books to learn these things in depth?


Systems Performance: Enterprise and the Cloud, 2nd Edition (2020)

https://www.brendangregg.com/systems-performance-2nd-edition...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: