Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, we’re talking about emulation here, not a bytecode interpreter for a high level dynamically typed language. CPython is fairly slow, it’s written in C in a fairly understandable way, effectively using a large switch statement. There’s like three jumps or so per executed bytecode, it probably uses some jump table.

„All the tricks“ for an interpreter would mean 1 jump per interpreted instruction, or perhaps 0.5 for some of them. It would probably mean no explicit jump tables. It would probably mean the interpreter is set up pipelined like a real cpu - loading instructions on parallel with executing previous instructions. You can get perhaps 5-15% of native speed using for an interpreter all the tricks. QEMU can maybe get 20-50% (not sure these days actually, I haven’t followed improvements for a while). Rosetta can get close to 100%, using a JIT but also hardware that is specifically designed to aid emulation (memory accesses like on x86).

Given nowadays CPUs execute like 3-5 instructions per cycle, getting 5-15% performance for an interpreter would be very fast and very difficult to achieve, but possible - if designed from the ground up for performance.



i think it's unusual to hit 3-5 ipc sustained in normal code, but it seems like we're in broad agreement about quantitatively what kind of performance an interpreter can achieve. i mentioned three interpretive systems that do reach about 15% of native single-threaded performance with very different approaches: the ocaml bytecode interpreter, threaded interpretive code like gforth, and numpy

it's just that i call 15% of native performance 'very slow' and you call it 'very fast', because you're comparing the moped to the walker you're used to, and i'm comparing it to a sports car because that's what you're paying for

instruction set jitting getting a speedup instead of a slowdown goes back to last millennium with hp's dynamo https://dl.acm.org/doi/pdf/10.1145/349299.349303 and was central to transmeta's business plan. qemu's jit is sort of simpleminded to make it easier to maintain


A cpu emulator will only get to 15-20% with a lot of tricks. Usually a cpu interpreter would get like 5%. It’s not like Numpy, where you can increase performance by trying to offload as much execution as possible into native code. When emulating cpu instructions, the atomic instructions are tiny. But as I said, python is a poor comparison because the interpreter is very slow and there is a lot of overhead.


although i've written compilers and interpreters, i don't think i've ever written an emulator for a real cpu. i'm interested to hear about your experience




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: