Hacker Newsnew | past | comments | ask | show | jobs | submit | pjdesno's commentslogin

I learned logic design in a class where we wired up 74LS TTL, a couple of years before they switched to programmable logic, so my knowledge of this sort of thing comes from looking over the shoulders of folks who actually do it, but it seems really cool. In particular, I love the idea that you can shoehorn all sorts of temporal constraints into a type system.

I fear that progress in this field might be handicapped by the fact that the folks who know a lot of type theory have little idea of how hardware works, and rarely care, and most of the folks who know how hardware works don't know a lot about types beyond possible bad experiences with VHDL. Luckily there's a non-zero set of people in the overlap, though.


I had a computer architecture prof (a reasonably accomplished one, too) who thought that all CS units should be binary, e.g. Gigabit Ethernet should be 931Mbit/s, not 1000MBit/s.

I disagreed strongly - I think X-per-second should be decimal, to correspond to Hertz. But for quantity, binary seems better. (modern CS papers tend to use MiB, GiB etc. as abbreviations for the binary units)

Fun fact - for a long time consumer SSDs had roughly 7.37% over-provisioning, because that's what you get when you put X GB (binary) of raw flash into a box, and advertise it as X GB (decimal) of usable storage. (probably a bit less, as a few blocks of the X binary GB of flash would probably be DOA) With TLC, QLC, and SLC-mode caching in modern drives the numbers aren't as simple anymore, though.


It makes it inconvenient to do things like estimate how long it will take to transfer a 10GiB file. Both because of the the difference between G and Gi, and because one is in bytes and the other is in bits.

There are probably cases where corresponding to Hz, is useful, but for most users I think 119MiB/s is more useful than 1Gbit/s.


There's a good reason that gigabit ethernet is 1000MBit/s and that's because it was defined in decimal from the start. We had 1MBit/s, then 10MBit/s, then 100MBit/s then 1000MBit/s and now 10Gbit/s.

Interestingly, from 10GBit/s, we now also have binary divisions, so 5GBit/s and 2.5GBit/s.

Even at slower speeds, these were traditionally always decimal based - we call it 50bps, 100bps, 150bps, 300bps, 1200bps, 2400bps, 9600bps, 19200bps and then we had the odd one out - 56k (actually 57600bps) where the k means 1024 (approximately), and the first and last common speed to use base 2 kilo. Once you get into MBps it's back to decimal.


To add further confusion, 57600 was actually a serial port speed, from the computer to the modem, which was higher than the maximum physical line (modem) speed. Many people ran higher serial port speeds to take advantage of compression (115200 was common.)

56000 BPS was the bitrate you could get out of a DS0 channel, which is the digital version of a normal phone line. A DS0 is actually 64000 BPS, but 1 bit out of 8 is "robbed" for overhead/signalling. An analog phone lined got sampled to 56000 BPS, but lines were very noisy, which was fine for voice, but not data.

7 bits per sample * 8000 samples per second = 56000, not 57600. That was theoretical maximum bandwidth! The FCC also capped modems at 53K or something, so you couldn't even get 56000, not even on a good day.


This has nothing to do with the 1024, it has todo with the 1200 and the multiples of it and the 14k and 28k modems where everyone just cut off the last some hundred bytes because you never reached that speed anyway.

> that's because it was defined in decimal from the start

I mean, that's not quite it. By that logic, had memory been defined in decimal from the start (happenstance), we'd have 4000 byte pages.

Now ethernet is interesting ... the data rates are defined in decimal, but almost everything else about it is octets! Starting with the preamble. But the payload is up to an annoying 1500 (decimal) octets. The _minimum_ frame length is defined for CSMA/CD to work, but the max could have been anything.


This is the bit (sic) that drives me nuts.

RAM had binary sizing for perfectly practical reasons. Nothing else did (until SSDs inherited RAM's architecture).

We apply it to all the wrong things mostly because the first home computers had nothing but RAM, so binary sizing was the only explanation that was ever needed. And 50 years later we're sticking to that story.


RAM having binary sizing is a perfectly good reason for hard drives having binary sized sectors (more efficient swap, memory maps, etc), which in turn justifies all of hard disks being sized in binary.

Literally every number in a computer is base-2, not just RAM addressing. Everything is ulimately bits, pins, and wires. The physical and logical interface between your oddly sized disk and your computer? Also some base-2.

Not everything is made from wires and transistors. And that's why these things are usually not measured in powers of 2:

- magnetic media

- optical media

- radio waves

- time

There's good reasons for having power-of-2 sectors (they need to get loaded into RAM), but there's really no compelling reason to have a power-of-2 number of sectors. If you can fit 397 sectors, only putting in 256 is wasteful.


Since everything ultimately ends up inside a base-2 computer across base-2 bus that even if these media aren't subject to the same considerations it still makes sense to measure them that way.

The choice would be effectively arbitrary, the number of actual bits or bytes is the same regardless of the multiplier that you use. But since it's for a computer, it makes sense to use units that are comparable (e.g. RAM and HD).


Buses and networking fit best with base 10 bits (not bytes) per second for reasons that are hopefully obvious. But I agree with you that everything else naturally lends itself to base 2.

Even the disk sectors are in base 2. It's only the marketing that's in base 10.

Nope. The first home computers like the C64 had RAM and sectors on disc, which in case of the C64 means 256 bytes. And there it is again, the smaller base of 1024.

Just later, some marketing assholes thought they could better sell their hard drives when they lie about the size and weasel out of legal issues with redefining the units.


later, like 1956? The world's first commercial HDD was 5,000,000 characters.

NAND flash has overprovisioning even on a per-die basis, eg. Micron's 256Gbit first-generation 3D NAND had 548 blocks per plane instead of 512, and the pages were 16384+2208 bytes. That left space both for defects and ECC while still being able to provide at least the nominal capacity (in power of two units) with good yield, but meant the true number of memory cells was more than 20% higher than implied by the nominal capacity.

The decimal-vs-binary discrepancy is used more as slack space to cope with the inconvenience of having to erase whole 16MB blocks at a time while allowing the host to send write commands as small as 512 bytes. Given the limited number of program/erase cycles that any flash memory cell can withstand, and the enormous performance penalty that would result from doing 16MB read-modify-write cycles for any smaller host writes, you need way more spare area than just a small multiple of the erase block size. A small portion of the spare area is also necessary to store the logical to physical address mappings, typically on the order of 1GB per 1TB when tracking allocations at 4kB granularity.


An even bigger problem is that networks are measured in bits while RAM and storage are in bytes. I'm sure this leads to plenty of confusion when people see a 120 meg download on their 1 gig network.

(The old excuse was that networks are serial but they haven't been serial for decades.)


Wirespeeds and bitrate and baud and all that stuff is vastly confusing when you start looking into it - because it's hard to even define what a "bit on the wire" is when everything has to be encoded in such a way that it can be decoded (specialized protocols can go FASTER than normal ones on the same wire and the same mechanism if they can guarantee certain things - like never having four zero bits in a row).

I can see a precision argument for binary represented frequencies. A systems programmer would value this. A musician would not.

musicians use numbering systems that are actually far more confused than anything discussed here. how many notes in an OCTave? "do re mi fa so la ti do" is eight, but that last do is part of the next octave, so an OCTave is 7 notes. (if we count transitions, same thing, starting at the first zero do, re is 1, ... again 7.

the same and even more confusion is engendered when talking about "fifths" etc.


The 7 note scale you suggest (do re mi fa so la ti do) is comprised of different intervals (2 2 1 2 2 2 1) in the 12-fold equal tempered scale. There are infinite ways of exploring an octave in music, but unfortunately listener demand for such exploration is near infinitesimal.

don't you mean 11-fold? ... oh wait, they aren't even consistent

They sum to 12

actually they multiply, 12th root of 2, to the 12th

12th root of 2, to the 12th = 2 :D The collection of 7 intervals I provided, 2 2 1 2 2 2 1, which are a differential representation of "(do) re mi fa so la ti do", sum to 12. Those intervals are linear within the log2 scale you identified as having a 12th root of 2 basis, or in other words, are the major diatonic (7 note) scale which are a subset of the 12-tone equal tempered scale. The laws of logarithms can help explain why these intervals are additive, whereas the semitone basis (12th root of 2) is multiplicative.

You can blame the Romans for that, as they practiced inclusive counting. Their market days occurring once every 8 days were called nundinae, because the next market day was the ninth day from the previous one. (And by the same logic, Jesus rose from the dead on the third day.)

Musicians often use equal temperament, so they have their own numerical crimes to answer for.

Touché, appropriate to describe near compulsory equal temperament (ala MIDI) as a crime.

In the 70s the oil companies were furious that Venezuela (if my understanding is correct) revoked their leases and forced them to abandon their equipment investments.

That's basically what the administration was trying to do here, under a legal system which (unlike Venezuela in the 70s) is very keen on protecting corporate investment. It seems like a classic "takings" case.


the Venezuelan oil leases you are talking about was 1990s, not 1970s.

for Venezuelan oil leases to be comparable to wind farms you'd have to have the Venezuelan govt say "we are taking the leases away because we don't want any more offshore oil production", rather than "we are taking these leases away because you are rich and we want to pump the oil ourselves"

the cancelled Venezuelan oil leases were a taking, but that word is less useful in the case of wind farms. I would imagine firms with wind farm contracts would be made whole (i.e. get back lost investment, but not get back potential profit) but it's not a case of the wind farms being given to somebody else or those areas being put to some other use.

if you are "environmental" you might think it's a great loss not to pursue the wind approach, or that it's a great idea to shut down offshore drilling, but that's political not property ownership/taking.


> for Venezuelan oil leases to be comparable to wind farms you'd have to have the Venezuelan govt say "we are taking the leases away because we don't want any more offshore oil production"

That isn't a crazy interpretation of what actually happened. According to wiki [0] the industry basically collapsed to 50% of its former production after the nationalisation era and the overall trend since then has been downwards. If a major political contingent in the US sets themselves against wind energy it could easily play out similarly. That'd be in line with other battles in the War on Energy that played out with nuclear and fossil fuels.

[0] https://en.wikipedia.org/wiki/History_of_the_Venezuelan_oil_...


A helpful visual (Wiki has a picture of an outdated version of this graph): https://ourworldindata.org/grapher/oil-production-by-country...

>> I would imagine firms with wind farm contracts would be made whole

Wait..what? Made whole by whom? Has this happened before?

(I'm genuinely curious, I've not seen this brought up before...)


Were the Venezuelans wrong? Should a country just legally accept its conquerors economically?

That's pretty much the law of the strongest. Mess with American colonialism and you end up like Cuba or Venezuela.

It's better to have your natural resources stolen than having your whole country wrecked by embargo, secret NSA plots, etc..


its not like they built it.

If the concern is the control module of the wind turbine— that’s not a nationalization and confiscation program. It might look similar in the near-term to participants, but that’s simply because they are functioning as instruments of the control module supplier, extending the inference, which isn’t a legitimate owner of the wind farms or US electrical grid anyway, and is quite unlike the fossil fuel companies in Venezuela of the 1970s.

Cambridge MA was rezoned in the mid-20th century to suburban standards, in a city where land in a mid-range neighborhood now costs $350-$400 per square foot. Besides putting in floor area ratio requirements that required most of the existing housing to be grandfathered, they added a requirement of one parking spot per unit.

If it's a traditional 1-car driveway that's about $70K worth of land, although in the end it's zero-sum because it takes away an on-street spot. Parking garages for larger developments probably cost as much or more per parking space - they use less land, but they're expensive to build.

It's insane, and they're trying to fix it, and approving special permits left and right to omit the spots.


So basically they basically took everyone's property rights for 50yr and now then give them back at the government's pleasure.

Now, I know nobody not rich enough to play the game owns land in Cambridge, but even the something tells me the guy who had a Trump 2024 sign on his garage, the guy who owns a business that's at odds with the city, something tells me their permits require the most $1-10k engineering expenses, lawyers fees, etc, etc, to get approved.


Despite whatever the NRA says, governments have a near-monopoly on violence. They've got all the good weapons - Google "Neal Brennan Has a Plan to Test the 2nd Amendment" for a humorous take on this.

That leaves non-violence, which is perhaps a misnomer - there's often plenty of violence, but it's used by the government, not its opponents. When non-violence works, it's typically because those working for the government start refusing to kill their fellow countrymen - they defect, in non-violence scholar-speak.

There's an authoritarian playbook for countering this - you recruit your forces from ethnic minorities, often rural, who already hate the people who are protesting. Thus you see ICE recruits from the Deep South and National Guard troops from Texas being sent into Northern cities.


Tesla finance seems legendary in this regard. A friend here in MA got hauled down to city hall because their auto excise taxes were 3 years overdue - they're the responsibility of the owner of a leased car, in this case Tesla finance. According to the person there, the town (50K people or so) had a bunch of Tesla owners in the same boat.


If your review was based on features shipped, and your bosses let you send PRs with no tests, would you? And before you say "no" - would you still do that if your company used stack ranking, and you were worried about being at the bottom of the stack?

Developers may understand that "XYZ is better", but if management provides enough incentives for "not XYZ", they're going to get "not XYZ".


That actually wasn't why I didn't write tests a lot of the time.

What stopped me was that after a year of writing tests, I was moved to a higher priority project, and the person who followed me didn't write tests.

So when I came back, many of the tests were broken. I had to fix all those in order to get new ones to not be a bother.

Repeat again, but this time I came back and the unit testing suite had fundamentally altered its nature. None of the tests worked and they all needed to be rewritten for a new paradigm.

I gave up on tests for that system at that point. It simply wasn't worthwhile. Management didn't care at all, despite how many times I told them how much more reliable it made that system, and it was the only system that survived the first giant penetration test with no problems.

That doesn't mean I quite testing. I still wrote tests whenever I thought it would help me with what I was currently working on. And that was quite often. But I absolutely didn't worry about old tests, and I didn't worry about making sure others could use my tests. They were never going to try.

The final straw, less than a year before I was laid off, was when they decided my "storybook" tests weren't worth keeping in the repo and deleted them. That made me realized exactly how much they valued unit tests.

That isn't to say they had no tests. There was a suite of tests written by the boss that we were required to run. They were all run against live or dev servers with a browser-control framework, and they were shaky for years. But they were required, so they were actually kept working. Nobody wrote new tests for it until something failed and caused a problem, though.

tl;dr - There are a lot of reasons that people choose not to write tests, and not just for job security.


Well this wasn’t really aimed at individual devs, but the team/company standards.

I’ve worked at several teams and it was always the norm that all PRs come with tests. There was never a dedicated QA person (sometimes there would be an eng responsible for the test infra, but you would write your own tests).

I would never accept a PR without tests unless it was totally trivial (e.g. someone mentioned fixing a typo).


Breaking prod repeatedly probably impacts your stack ranking too.


Depends on how easily the failure is connected back to you personally. If you introduce a flaw this year and it breaks the system in two years, it won't fall back on you but the poor sap that triggered your bug.


So can "heroically" save prod ... anti patterns.


A broken environment engenders broken behavior and this explains is why it is bizarre, not that it isn't bizarre.


No, they got it published in ACM SIGSOFT Software Engineering Notes.

That's one of the things that publication is for.

The paper is a well-supported (if not well-proofread) position paper, synthesizing the author's thoughts and others' prior work but not reporting any new experimental results or artifacts. The author isn't an academic, but someone at Amazon who has written nearly 20 articles like this, many reporting on the intersection of academic theory and the real world, all published in Software Engineering Notes.

As an academic (in systems, not software engineering) who spent 15 years in industry before grad school, I think this perspective is valuable. In addition academics don't get much credit for this sort of article, so there are a lot fewer of them than there ought to be.


Since no one else seems to have pointed this out - the OP seems to have misunderstood the output of the 'time' command.

  $ time ./wc-avx2 < bible-100.txt
  82113300
  
  real    0m0.395s
  user    0m0.196s
  sys     0m0.117s
"System" time is the amount of CPU time spent in the kernel on behalf of your process, or at least a fairly good guess at that. (e.g. it can be hard to account for time spent in interrupt handlers) With an old hard drive you would probably still see about 117ms of system time for ext4, disk interrupts, etc. but real time would have been much longer.

    $ time ./optimized < bible-100.txt > /dev/null

    real    0m1.525s
    user    0m1.477s
    sys     0m0.048s
Here we're bottlenecked on CPU time - 1.477s + 0.048s = 1.525s. The CPU is busy for every millisecond of real time, either in user space or in the kernel.

In the optimized case:

  real    0m0.395s
  user    0m0.196s
  sys     0m0.117s
0.196 + 0.117 = 0.313, so we used 313ms of CPU time but the entire command took 395ms, with the CPU idle for 82ms.

In other words: yes, the author managed to beat the speed of the disk subsystem. With two caveats:

1. not by much - similar attention to tweaking of I/O parameters might improve I/O performance quite a bit.

2. the I/O path is CPU-bound. Those 117ms (38% of all CPU cycles) are all spent in the disk I/O and file system kernel code; if both the disk and your user code were infinitely fast, the command would still take 117ms. (but those I/O tweaks might reduce that number)

Note that the slow code numbers are with a warm cache, showing 48ms of system time - in this case only the ext4 code has to run in the kernel, as data is already cached in memory. In the cold cache case it has to run the disk driver code, as well, for a total of 117ms.


> was in unsafe code, and related to interop with C

1) "interop with C" is part of the fundamental requirements specification for any code running in the Linux kernel. If Rust can't handle that safely (not Rust "safe", but safely), it isn't appropriate for the job.

2) I believe the problem was related to the fact that Rust can't implement a doubly-linked list in safe code. This is a fundamental limitation, and again is an issue when the fundamental requirement for the task is to interface to data structures implemented as doubly-linked lists.

No matter how good a language is, if it doesn't have support for floating point types, it's not a good language for implementing math libraries. For most applications, the inability to safely express doubly-linked lists and difficulty in interfacing with C aren't fundamental problems - just don't use doubly-linked lists or interface with C code. (well, you still have to call system libraries, but these are slow-moving APIs that can be wrapped by Rust experts) For this particular example, however, C interop and doubly-linked lists are fundamental parts of the problem to be solved by the code.


> If Rust can't handle that safely (not Rust "safe", but safely), it isn't appropriate for the job.

Rust is no less safe at C interop than using C directly.


As long as you keep C pointers as pointers. The mutable aliasing rules can bite you though.


(Not the user you were replying to)

If Rust is no less safe than C in such a regard, then what benefit is Rust providing that C could not? I am genuinely curious because OS development is not my forte. I assume the justification to implement Rust must be contingent on more than Rust just being 'newer = better', right?


It's not less safe in C interop. It is significantly safer at everything else.


For 1) The necessary safety guarantees downgrade to the level available in C all the time.


The issue is unrelated to expressing linked lists, it's related to race conditions in the kernel, which is one of the hardest areas to get right.

This could have happened with no linked lists whatsoever. Kernel locks are notoriously difficult, even for Linus and other extremely experienced kernel devs.


> This is a fundamental limitation

Not really. Yeah you need to reach into unsafe to make a doubly linked list that passes borrow checker.

Guess what. You need unsafe implementation to print to console. Doesn't mean printing out is unsafe in Rust.

That's the whole point of safe abstraction.


I love rust, but C does make it a lot easier to make certain kinds of container types. Eg, intrusive lists are trivial in C but very awkward in rust. Even if you use unsafe, rust’s noalias requirement can make a lot of code much harder to implement correctly. I’ve concluded for myself (after a writing a lot of code and a lot of soul searching) that the best way to implement certain data structures is quite different in rust from how you would do the same thing in C. I don’t think this is a bad thing - they’re different languages. Of course the best way to solve a problem in languages X and Y are different.

And safe abstractions mean this stuff usually only matters if you’re implementing new, complex collection types. Like an ECS, b-tree, or Fenwick tree. Most code can just use the standard collection types. (Vec, HashMap, etc). And then you don’t have to think about any of this.


> I love rust, but C does make it a lot easier to make certain kinds of container types.

Ok. But making it easier or harder isn't the same as making it impossible.

To quote GP:

> 2) I believe the problem was related to the fact that Rust can't implement a doubly-linked list in safe code.

Rust can implement doubly linked list in safe code. It can. It wraps the unsafe parts in an abstract manner.


> Rust can implement doubly linked list in safe code. It can. It wraps the unsafe parts in an abstract manner.

As far as I know, only with Rc / RefCell. But that has a significant performance cost.

Am I wrong? I'd love to see an example / benchmarks.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: