FWIW this RCLI is only MIT license but their engine MetalRT is commercial. Not s...

shubham2802 · 2026-03-10T20:00:46 1773172846

Updating the readme asap - but thanks for the feedback. Also, please checkout few things : https://www.runanywhere.ai/blog/metalrt-speech-fastest-stt-t... https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e...

sanchitmonga22 · 2026-03-11T01:55:42 1773194142

Fair feedback on the README clarity, we've updated it to make the licensing distinction between RCLI (MIT) and MetalRT (proprietary) more prominent. That should have been clearer from day one.

On why we built MetalRT instead of using CoreML or MLX:

CoreML is optimized for classification and vision models, not autoregressive text generation. ANE is powerful for fixed-shape workloads but doesn't handle the dynamic shapes in LLM decode well.

MLX is much closer to what we need, and we respect what Apple has built. But MLX is a general-purpose array framework, it carries abstractions for developer ergonomics and portability that add overhead. MetalRT is purpose-built for inference only, and the numbers reflect that: 1.1-1.2x faster on LLMs (same model files) and 4.6x faster on STT.

We also needed one unified engine for LLM + STT + TTS rather than stitching three separate runtimes together. That doesn't exist in any of the alternatives listed.

The libraries you mentioned (FluidAudio, mlx-swift-audio, sherpa-onnx) are good projects. RCLI actually uses sherpa-onnx as it's fallback engine when MetalRT isn't installed. They solve different problems at different layers of the stack.

antipaul · 2026-03-10T19:56:27 1773172587

Nice list.

What about for on-device RAG use cases?

sanchitmonga22 · 2026-03-11T03:23:56 1773199436

RCLI includes local RAG out of the box. You can ingest PDFs, DOCX, and plain text, then query by voice or text:

rcli rag ingest ~/Documents/notes rcli ask --rag ~/Library/RCLI/index "summarize the project plan"

It uses hybrid retrieval (vector + BM25 with Reciprocal Rank Fusion) and runs at ~4ms over 5K+ chunks. Embeddings are computed locally with Snowflake Arctic, so nothing leaves you're machine.