Thanks for the suggestion. Indeed, that would be the first approach. That is how I started. This is however not considering the state (am I inside a 'statement' or not)
statement meaning string from first non-space till next EOL or EOF.
Problem starts when you need to cover the "corner cases". Without the corner cases the algo is not algo.
So I've thought about it and I don't really feel like spending more time to convince you that this works. If you have questions I am happy to answer them, but please write your own code.
It's fine and thank you! I am playing arround with the idea, in theory all is good.. Only thing is that things like "first non ..." often involve branching that corrupts the prediction ability of the CPU. Therefore I kindly invited you to show it in code.
You can find the first set bit in an integer with a machine instruction, it's completely branch free. gcc has __builtin_ctz() for this. You'll either need to iterate over all set bits (so one branch per set bit) or use a compression instruction (requiring AVX-512) to turn the bit set into a set of integers.
That said, as you seem to actually want to do something with the results, you'll take a branch per match anyway, so I don't see the problem.
statement meaning string from first non-space till next EOL or EOF.
Problem starts when you need to cover the "corner cases". Without the corner cases the algo is not algo.