You can do it like this, assuming A is the mask of newlines and B is the mask of non-spaces.
1. Compute M1 = ~A & ~B, which is the mask of all spaces that are not newlines
2. Compute M2 = M1 + (A << 1) + 1, which is the first non-space or newline after each newline and then additional bits behind each such newline.
3. Compute M3 = M2 & ~M1, which removes the junk bits, leaving only the first match in each section
Note that this code treats newlines as non-spaces, meaning if a line comprises only spaces, the terminating NL character is returned. You can have it treat newlines as spaces (meaning a line of all spaces is not a match) by computing M4 = M3 & ~A.
Thanks for the suggestion. Indeed, that would be the first approach. That is how I started. This is however not considering the state (am I inside a 'statement' or not)
statement meaning string from first non-space till next EOL or EOF.
Problem starts when you need to cover the "corner cases". Without the corner cases the algo is not algo.
So I've thought about it and I don't really feel like spending more time to convince you that this works. If you have questions I am happy to answer them, but please write your own code.
It's fine and thank you! I am playing arround with the idea, in theory all is good.. Only thing is that things like "first non ..." often involve branching that corrupts the prediction ability of the CPU. Therefore I kindly invited you to show it in code.
You can find the first set bit in an integer with a machine instruction, it's completely branch free. gcc has __builtin_ctz() for this. You'll either need to iterate over all set bits (so one branch per set bit) or use a compression instruction (requiring AVX-512) to turn the bit set into a set of integers.
That said, as you seem to actually want to do something with the results, you'll take a branch per match anyway, so I don't see the problem.
ARM already has most stuff required for this on board. Two proprietary extensions are used by Rosetta: one emulates the parity (rarely used) and half-carry (obsolete) flags, which can also be emulated conventionally. The other implementa TSO memory ordering, which can either be ignored or implemented with explicit barriers; some other chips apparently have a similar setting.
The other stuff is all present in ARMv8.5 I think.
Had the same feeling browsing through the Haskell package collection. Felt like and almagamation of PhD theses, none of which were maintained after the author got his degree. Every single one a work of art, but most engeneered so badly that you would only use them begrudgingly.
My impression of Rust crates is that most are developed because a standardized solution to the problem didn't exist or didn't meet the author's needs, so they built their own. Many are well designed, but were never used by enough people to become truly usable or robust before they were abandoned.
It seems like outside certain problem domains, there isn't any effort to pool resources to keep projects alive. The few I did find were forks of forks where each subsequent maintainer stopped responding to proposed changes.
1. Compute M1 = ~A & ~B, which is the mask of all spaces that are not newlines 2. Compute M2 = M1 + (A << 1) + 1, which is the first non-space or newline after each newline and then additional bits behind each such newline. 3. Compute M3 = M2 & ~M1, which removes the junk bits, leaving only the first match in each section
Here is what it looks like:
Note that this code treats newlines as non-spaces, meaning if a line comprises only spaces, the terminating NL character is returned. You can have it treat newlines as spaces (meaning a line of all spaces is not a match) by computing M4 = M3 & ~A.