This is seriously impressive — 6 BIPS by a single developer is insane.
The correctness-first approach and clean architecture really show.
But honestly, at this point, to make it truly useful, you probably just need to write your own OS next
Thanks for sharing this it’s inspiring to see what focused systems work can achieve.
How did you not burn out doing all this?
Honestly, in many systems projects it is often simpler and safer to minimize or even avoid malloc entirely: rely more on stack, static buffers, or simple arenas. This usually leads to fewer bugs and more predictable behavior than building complex lifetime systems on top of the heap.
That is a valid point. When you are developing an application for an STM32, allocating memory on the heap is a bad idea. But if a programmer wants to make a high-performance back-end server in C, they should allocate unpredictable amounts of memory depending on a user's input(in many cases). This is not for Embedded/Hardware control; it aims for 'C Revival for High-level Development'. For a good example, many modern Rust applications do not statically fix memory areas; that is why Rust had to develop such a complex memory ownership system. In my personal opinion, with a bag of potato chips, you can do it better in C. Honestly, this does not have a proper memory tracker, so- yes. This does not have any advantages for now. However, I am planning to develop a memory lifetime tracker that can catch issues even before we run Valgrind or GDB. You can try to develop your own memory manager. This helped me to make my C-based web development framework safer through refactoring.
This is an impressive piece of engineering, no doubt. The API is clean, performance work is serious, and it’s clear a lot of effort went into making this fast. But let’s be honest: without autograd and a real training ecosystem, this is not a PyTorch replacement, it’s a very nice numerical toolbox. Also, tying GPU acceleration mostly to Metal makes this far less useful outside the Apple ecosystem. Right now, it looks like a technically excellent project searching for its real-world niche. If you add proper differentiation, broader GPU support, and prove that this scales with real users, then it could become something truly important. Until then, it’s great work — but not a revolution.
I appreciate the advice. Right now, numerical coverage, absolute performance, and DX are my biggest priorities. Looking to get traction from OSS so scope creep doesn't catch up to me and some passionate devs can jump on board, autograd and CUDA are the next really big milestones for Axiom.
reply