TurboQuant has a specific benefit by compressing the KV cache at a negligible cost to quality. That mainly means that the context lengths can go up in models for the same amount of memory, however the KV cache only accounts for something like 20% of the overall model size, and this will not dramatically decrease memory demands in the way that some of the more sensationalist reporting has stated.
Still active, but many fewer resources than in the past. Many backends like CUDA for Windows have been dropped and others pushed off to partners with varying levels of support. TensorFlow 2.19 is going to release soon without Python 3.13 support, it's hard not to imagine that resource constraints are at play.
Exciting concept! Note that the LLM corrected version does drop a full paragraph from the output at the bottom of the second page (starting with an asterisk and "My views regarding inflationary possibilities". I'm not sure if there is a simple way to mitigate this risk but would be nice to fall back on uncorrected text if the LLM can't produce valid results for some region of the document.
I recently had occasion to evaluate a database of 1200+ NVIDIA GPUs and can tell you that the only thing consistent about the model numbers is their inconsistency. For example, what is an RTX 4000? It could be the 2018 Quadro RTX 4000, the Quadro RTX 4000 Max-Q, or Quadro RTX 4000 Mobile (all Turing cards), but it could also be the RTX 4000 Mobile Ada Generation (Ada Lovelace card released 2023).
If you're revisiting the classics I can't recommend enough Doug Metzger's Literature and History podcast [1]. It covers literature starting with Mesopostamian stories, at about the level of an undergraduate course, but is entertaining and insightful throughout. It's clearly had deep research put into every episode, but at the same time takes great effort to make the material relatable. Great stuff.
Endpoints are the machines (desktops, servers, &c) in organizations which have Carbon Black installed. Their client continuously monitors for process executions, network connections, file changes, registry changes, and samples unique files on the machine, and depending on configuration, can upload these contents both within the enterprise and share them with their cloud platform. That's what is meant by "our technology uniquely collects complete, "unfiltered" endpoint data by continuously recording endpoint activity and centrally storing the collected data for advanced analytics".
reply