If you look at ML, Python is completely fine because all the processing that happens with matrix multiplication, even on CPUs, far, far, FAR outweighs all the setup stuff in volume of operations.
On the other hand, if majority of your application relies heavily on processing speed (i.e you need compare/jump operations rather just add/multiply/load/store of the GPUs), Python is going to be slow. In this case, if you want custom performant code, you write C extensions for the performant critical code, and launch them from higher level python code.
That being said, there is generally a library (like Taichi) that already does this for you.
If you look at ML, Python is completely fine because all the processing that happens with matrix multiplication, even on CPUs, far, far, FAR outweighs all the setup stuff in volume of operations.
On the other hand, if majority of your application relies heavily on processing speed (i.e you need compare/jump operations rather just add/multiply/load/store of the GPUs), Python is going to be slow. In this case, if you want custom performant code, you write C extensions for the performant critical code, and launch them from higher level python code.
That being said, there is generally a library (like Taichi) that already does this for you.