More

clausecker · 2025-12-26T00:24:55 1766708695

You can do it like this, assuming A is the mask of newlines and B is the mask of non-spaces.

1. Compute M1 = ~A & ~B, which is the mask of all spaces that are not newlines 2. Compute M2 = M1 + (A << 1) + 1, which is the first non-space or newline after each newline and then additional bits behind each such newline. 3. Compute M3 = M2 & ~M1, which removes the junk bits, leaving only the first match in each section

Here is what it looks like:

    10010000 = A
    01100110 = B
    00001001 = M1 = ~A & ~B
    00101010 = M2 = M1 + (A << 1) + 1
    00100010 = M3 = M2 & ~M1

Note that this code treats newlines as non-spaces, meaning if a line comprises only spaces, the terminating NL character is returned. You can have it treat newlines as spaces (meaning a line of all spaces is not a match) by computing M4 = M3 & ~A.

zokrezyl · 2025-12-27T09:50:21 1766829021

Thanks for the suggestion. Indeed, that would be the first approach. That is how I started. This is however not considering the state (am I inside a 'statement' or not)

statement meaning string from first non-space till next EOL or EOF.

Problem starts when you need to cover the "corner cases". Without the corner cases the algo is not algo.

zokrezyl · 2025-12-27T10:02:41 1766829761

Do you mind trying out your solution? The code is in https://github.com/zokrezyl/yaal-cpp-poc Thanks a lot!

Obviously if your solutions gets closed to the memory bandwith limit, we will proudly mention it!

clausecker · 2025-12-28T18:24:51 1766946291

So I've thought about it and I don't really feel like spending more time to convince you that this works. If you have questions I am happy to answer them, but please write your own code.

zokrezyl · 2025-12-29T17:45:24 1767030324

It's fine and thank you! I am playing arround with the idea, in theory all is good.. Only thing is that things like "first non ..." often involve branching that corrupts the prediction ability of the CPU. Therefore I kindly invited you to show it in code.

clausecker · 2025-12-30T01:48:22 1767059302

You can find the first set bit in an integer with a machine instruction, it's completely branch free. gcc has __builtin_ctz() for this. You'll either need to iterate over all set bits (so one branch per set bit) or use a compression instruction (requiring AVX-512) to turn the bit set into a set of integers.

That said, as you seem to actually want to do something with the results, you'll take a branch per match anyway, so I don't see the problem.

clausecker · 2025-12-28T18:23:28 1766946208

This does track the state. If you want to track it across multple vectors of input, you'll need to carry it over manually.

clausecker · 2025-12-26T00:01:53 1766707313

We do this with FreeBSD ports, but users don't have to clone the ports tree unless they want to modify ports or compile them with custom options.

WhyNotHugo · 2025-12-26T11:23:22 1766748202

Yeah, using it as an actual source repository sounds fine to me. I think the issue comes when users are expected to clone it just for installation.

clausecker · 2025-12-08T13:50:44 1765201844

And the same reason NVRAM was dead on arrival. No affordable dev systems meant that only enterprise software supported it.

clausecker · 2025-12-03T12:40:11 1764765611

That's more "load store architecture" than RISC. And by that measure, S/360 could be considered a RISC.

clausecker · 2025-12-03T12:38:58 1764765538

You may enjoy the RISC deprogrammer: https://blog.erratasec.com/2022/10/the-risc-deprogrammer.htm...

clausecker · 2025-11-28T18:16:16 1764353776

FreeBSD

clausecker · 2025-11-21T19:35:29 1763753729

ARM already has most stuff required for this on board. Two proprietary extensions are used by Rosetta: one emulates the parity (rarely used) and half-carry (obsolete) flags, which can also be emulated conventionally. The other implementa TSO memory ordering, which can either be ignored or implemented with explicit barriers; some other chips apparently have a similar setting.

The other stuff is all present in ARMv8.5 I think.

clausecker · 2025-11-10T22:16:37 1762812997

It's very likely that there is some serious autovectorisation going on behind the scenes.

clausecker · 2025-11-06T11:46:54 1762429614

Best one was when gedit had the option to syntax highlight for a language named “Los.”

flobosg · 2025-11-06T13:50:04 1762437004

Not a bad name, to be honest!

clausecker · 2025-10-19T22:40:03 1760913603

Had the same feeling browsing through the Haskell package collection. Felt like and almagamation of PhD theses, none of which were maintained after the author got his degree. Every single one a work of art, but most engeneered so badly that you would only use them begrudgingly.

soupy-soup · 2025-10-20T01:56:15 1760925375

My impression of Rust crates is that most are developed because a standardized solution to the problem didn't exist or didn't meet the author's needs, so they built their own. Many are well designed, but were never used by enough people to become truly usable or robust before they were abandoned.

It seems like outside certain problem domains, there isn't any effort to pool resources to keep projects alive. The few I did find were forks of forks where each subsequent maintainer stopped responding to proposed changes.

undeveloper · 2025-10-21T04:55:50 1761022550

haskell