My concern is that almost anything that fits in a C64 will run out of the L1 cache in most modern CPUs and yield very unrealistic results compared to reasonable modern workloads.
I think that's a valid result in its own way, though. It's not the poor C64's fault that we insist on chewing through megabytes and gigabytes just to run a chat app.
Every benchmark means something. The problem is you make a benchmark to measure the thing you are curious about. If that then indicates a result that makes something you like look bad, then you feel it is an unfair benchmark. Or if you were curious about some other kind of performance, then it is a bad benchmark.
Every benchmark is meaningless or unfair to somebody.