More

scw · 2026-03-30T14:09:14 1774879754

TurboQuant has a specific benefit by compressing the KV cache at a negligible cost to quality. That mainly means that the context lengths can go up in models for the same amount of memory, however the KV cache only accounts for something like 20% of the overall model size, and this will not dramatically decrease memory demands in the way that some of the more sensationalist reporting has stated.

lostmsu · 2026-03-30T14:53:59 1774882439

In large providers KV caches are the main bottleneck, no?

scw · on Feb 23, 2025

Still active, but many fewer resources than in the past. Many backends like CUDA for Windows have been dropped and others pushed off to partners with varying levels of support. TensorFlow 2.19 is going to release soon without Python 3.13 support, it's hard not to imagine that resource constraints are at play.

scw · on Sept 26, 2024

The Microsoft stake finally allowed him to let loose and buy a keyboard with a working shift key.

scw · on Aug 10, 2024

Exciting concept! Note that the LLM corrected version does drop a full paragraph from the output at the bottom of the second page (starting with an asterisk and "My views regarding inflationary possibilities". I'm not sure if there is a simple way to mitigate this risk but would be nice to fall back on uncorrected text if the LLM can't produce valid results for some region of the document.

scw · on Nov 13, 2023

I recently had occasion to evaluate a database of 1200+ NVIDIA GPUs and can tell you that the only thing consistent about the model numbers is their inconsistency. For example, what is an RTX 4000? It could be the 2018 Quadro RTX 4000, the Quadro RTX 4000 Max-Q, or Quadro RTX 4000 Mobile (all Turing cards), but it could also be the RTX 4000 Mobile Ada Generation (Ada Lovelace card released 2023).

scw · on Feb 15, 2020

Reminds me of “boko-maru” of the made-up religion [1] in Vonnegut's Cat's Cradle.

1. https://en.m.wikipedia.org/wiki/Bokononism

scw · on Jan 28, 2020

The methodological approach and data sources are detailed in the associated post by JHU professor Lauren Gardner: https://systems.jhu.edu/research/public-health/ncov/

scw · on Oct 12, 2018

Check out: https://medium.com/microsoft-open-source-stories/how-microso...

scw · on Sept 2, 2018

If you're revisiting the classics I can't recommend enough Doug Metzger's Literature and History podcast [1]. It covers literature starting with Mesopostamian stories, at about the level of an undergraduate course, but is entertaining and insightful throughout. It's clearly had deep research put into every episode, but at the same time takes great effort to make the material relatable. Great stuff.

1. http://literatureandhistory.com/

scw · on April 9, 2018

Endpoints are the machines (desktops, servers, &c) in organizations which have Carbon Black installed. Their client continuously monitors for process executions, network connections, file changes, registry changes, and samples unique files on the machine, and depending on configuration, can upload these contents both within the enterprise and share them with their cloud platform. That's what is meant by "our technology uniquely collects complete, "unfiltered" endpoint data by continuously recording endpoint activity and centrally storing the collected data for advanced analytics".