Hacker Newsnew | past | comments | ask | show | jobs | submit | kolbusa's commentslogin

Also try playing with the can and the trash bin.


Not sure why the article does not reference the following paper which is a must read for anyone working with floating point: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h... (original: https://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf).


I also recently saw this paper on the difficulty of solving the quadratic equation with floating point numbers:

https://cnrs.hal.science/hal-04116310/document

And also Gerald Sussman saying:

> The only thing that scares me in programming is floating point.

https://youtu.be/Tdwr9tweTDE?t=1145


I am visiting Boston and I cannot stop wondering why SF is not like that. The city feels so much more livable almost everywhere I went. I'm sure there are shady parts, but every time I need to go to SF for some reason I get really depressed.


Respectfully, you’re almost certainly going to the wrong parts. My source is that I grew up inside Boston and now reside inside SF.

Boston is amazing, and I love it. But SF is too. For similar reasons. SF is a city of neighborhoods. If you’re going to downtown, or any of the business centers, you’re not getting the good parts. The enjoyable nice parts of SF are all residential. Because of the hills, each residential neighborhood (a valley) has its own unique commercial street full of shops and restaurants, surrounded by beautiful old townhomes, and as you go up the hills you get vistas and nice homes. The city quality is inversely correlated with office space.

Boston has similar historic driving forces - instead of hills, it used to be a city of (now infilled) peninsulas. You get wonderful old homes in Boston, and lots of streets full of shops. Instead of tech money (which Boston also has) it was overrun first by the education industry, which anchors many neighborhoods today.


could you expand on what you mean by "the education industry"?

I was picturing an army of teachers, but I don't normally think of teachers as folks who earn enough to be compared to tech money :)


I've never been to Boston, but Wikipedia tells me they have several universities - Harvard and MIT, which I've heard of, and also Boston University, Boston College, University of Massachusetts Boston, Bentley University, Brandeis University, Tufts University, Northeastern University, Wentworth Institute of Technology and a load of others.

In a city with a population of 600k that's going to be a decent part of the local economy.


Yes, Boston is considered the educational capital of the planet.

Boston itself is about 700,000 people, but if you extend things to a 20 mile radius from Boston (say from DTX), in that area there is a transient student population of 400,000 people that are only there to attend higher education and ultimately call elsewhere home. Within 20 miles of Boston are several dozen (nearly 60?) universities, making education one of its six or seven tent-pole industries.


To be pedantic, MIT and most of Harvard is Cambridge--across the river from the city of Boston. But, yes, the Boston area has a very university-influenced vibe much of it urban with some exceptions like BC and Wellesley.


We do not need you or Wikipedia to tell is that Boston University, Boston College, and University of Massachusetts Boston are in Boston.


There are something like 30 universities in the Boston metro, including some extremely press and wealthy ones. Universities like Harvard and MIT have sprawling research industrial complex’s beyond teaching students. And many thousands of employees, many of whom are high paid professionals.

All that to say nothing of the students. The population of Boston itself is ~600k, while the metro region has ~4M people and roughly 300k students reside in the metro. These are obviously not all local students, but students from all over the world who have come to Boston for education.

I didn’t mean directly that the schools had money, but that neighborhoods and civic fabric was built around the universities. But many do have a lot of money. Students tend not to travel far, so you get lots of self-contained neighborhoods around school. Similar to SF where the hills limit how far you’d walk.


Copying is usually not necessary. Often times you can swap data and/or shape arguments and get a transposed result out. While it is true that Fortran BLAS only supports col-major, CBLAS supports both row- and col-major. Internally, all the libraries I worked on use col-major, but that is just a convention.


Those who do have same qualifications are still payed much much less. Anecdotally, when Intel still had offices in Russia our salaries including stock awards were 3x or lower than those of the US personnel. Not because of difference in qualifications but because of the labor market (the salaries were very good considering other opportunities in my home city).


Nitpick... This paragraph is somewhat confusing. I think it is worded incorrectly:

> Let's simplify the problem and implicitly transpose the matrix multiplication. Both A and B (our inputs) will have K (our reduction dimension) as the leading dimension. This doesn't really matter much in practice, but it simplifies our code a lot.

The code is

  C[n * 16 + m] += A[k * 16 + m] * B[k * 16 + n];
Which means that actually *m* is the leading dimension of A with stride 16, and for B it is *n* with stride 16.


In my experience, EIGEN's threadpool is decent. But OpenMP (edit: I mean Intel's implementation donated to LLVM) is often faster especially if threads are allowed to be affinitized to HW processors. Another promise of OpenMP that is not made by various threadpools is cooperative execution; in threadpools tasks are usually independent.

However, if any part of an app uses affinitized threads, the whole app needs to be using the same thread pool, as otherwise perf will go down. In this regard, OpenMP is less composable.


They support esim.


It will be very relevant in any field that does performance testing. And that includes compilers.


I don't think you need a statistics class to do performance testing.


You change an optimizer pass. Looks good on a microbenchmark. But then you try it out on some samples of real code. Turns out, it probably makes some real code a little bit faster. No difference on other real code, but the data are noisy, so it's hard to tell. And on one piece of code, it seems to actually cause a regression - but again, the data from multiple runs are noisy.

Should you enable the optimizer change by default, or not? Or do you still need to collect more data? How much more data? What data - more runs or more different code samples? How confident do you want to be, and how confident can you be?

These are questions you will face in your real day to day work, and a few statistics courses will be incredibly helpful to you in answering them.


I knew someone would say something like that, but I've never seen that sort of thing done personally and I really doubt lack of statistical knowledge is even close to the biggest obstacle to writing faster software. For micro-optimizations so subtle you need fancy techniques to even tell they work, non-quantitative factors (code impact, will it enable other optimizations, etc.) are more likely to be decisive. Techniques to reduce noise are either non-statistical (warming up caches) or unsophisticated (average many trials, best of three).


My guess would be that this is because of thread migration.

(After reading the TFA: that's what Agner says right there in the 4th paragraph.)

(After re-reading the comment: I guess that the OS changes would need to be extensive with little to no benefit: running AVX2 on all cores will likely be faster than running 2 P cores with AVX512. The only thing that is really affected is the code that could use AVX512_FP16, but I doubt there's a lot of it outside of Intel.)


> I guess that the OS changes would need to be extensive

I don't think that is true. In the simplest case, you could modify the #UD handler to notice when the fault is caused by an AVX512 instruction running on an E-core, and then simply and pin the process to the P-cores, migrate the process, and continue. All existing scheduler functionality.

> The only thing that is really affected is the code that could use AVX512_FP16, but I doubt there's a lot of it outside of Intel.

AVX512 is a lot more than just extending the vector width, and that extended functionality can be very useful for quickly emulating other CPU's vector instruction sets.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: