More

herf · 2026-03-19T21:14:03 1773954843

This "early velocity only" approach seems like a problem - how do you know with 5-minute training runs that you aren't affecting the overall asymptote? e.g., what if the AI picks a quantizer that happens to be faster in the first five minutes, but has a big noise floor where it can't make more progress?

gwern · 2026-03-19T23:43:50 1773963830

Yes, it's greedy so may hit local optima. You can fit learning curves and try to extrapolate out to avoid that problem, to let you run long enough to be reasonably sure of a dead end, and periodically revive past candidates to run longer. See past hyperparameter approaches like freeze-thaw https://arxiv.org/abs/1406.3896 .

herf · 2026-03-13T18:28:55 1773426535

It's easy to lie to an OS about your age because it's a single-user experience, and if your parents allow you to lie (or don't know), that's all it takes. Social networks are so much better equipped to estimate age because they have a simple double-check, which is that most kids follow other kids in their grade level.

The patches on top of this are really bad. For instance, we are seeing "AI" biometric video detectors with a margin-of-error of 5-7 years (meaning the validation studies say when the AI says you're 23-25 you can be considered 18+), totally inadequate to do the job this new legislation demands.

herf · 2026-03-11T16:11:54 1773245514

They teach a lot of Taylor/Maclaurin series in Math classes (and trig functions are sometimes called "CORDIC" which is an old method too) but these are not used much in actual FPUs and libraries. Maybe we should update the curricula so people know better ways.

bee_rider · 2026-03-11T17:53:52 1773251632

Taylor series makes a lot more sense in a math class, right? It is straightforward and (just for example), when you are thinking about whether or not a series converges in the limit, why care about the quality of the approximation after a set number of steps?

xyzzyz · 2026-03-12T12:09:53 1773317393

Not quite. The point of Taylor’s theorem is that the n-th degree Taylor polynomial around a is the best n-th degree polynomial approximation around a. However, it doesn’t say anything about how good of an approximation it is further away from the point a. In fact, in math, when you use Taylor approximation, you don’t usually care about the infinite Taylor series, only the finite component.

davrosthedalek · 2026-03-12T11:50:17 1773316217

Taylor series have a quite different convergence behavior than a general polynomial approximation. Or polynomial fit for that matter. Many papers were written which confuse this.

For example, 1/(x+2) has a pole at x=-2. The Taylor series around 0 will thus not converge for |x|>2. A polynomial approximation for, say, a range 0<x<L, will for all L.

herf · 2026-03-11T15:57:40 1773244660

https://arxiv.org/pdf/2310.11453 The original paper [fig 1, bottom-right] seems to say it needs about 4-5x the parameters of a fp16 model. You can build it and run some models, but the selection is limited because it has to be trained from scratch. I imagine inference speed is faster compared with modern PTQ (4- and 8-bit quants) though.

herf · 2026-03-09T01:42:30 1773020550

I wish there were more ways to specify whether the Windows filesystem /mnt/c should be mounted in a WSL2 instance - it is kind of generally on or off. In cases where I want WSL2 to function as a "container" isolated from my desktop, I use a different Windows user just in case.

herf · 2026-03-08T21:32:49 1773005569

It's sad to see an LLM take over a blog, because you can see the line: before 2026 it's an interesting person you would like to talk to. After 2026, it's like generic LLM marketing-voice copy.

ecto · 2026-03-08T21:49:37 1773006577

Thanks, which of my pre-AI blog posts are your favorite?

herf · 2026-03-04T21:51:06 1772661066

Sideloading without automatic updates is not very useful

adithyassekhar · 2026-03-04T23:22:24 1772666544

That's one of the arguments for sideloading. I don't want to use your new liquid ass design.

greazy · 2026-03-05T07:05:44 1772694344

    Liquod ass design

This made me laugh. I agree that's a plus! Auto updates are mostly bad. Look at the state of extensions on vscode. The permissions combined with silent updates is scary

dark__paladin · 2026-03-04T22:06:06 1772661966

why not?

herf · 2026-02-18T18:06:05 1771437965

The days you move between categories can establish your birthdate, which is a lot of bits if you are doing this on an individual level (basically it's a great start at a supercookie).

herf · 2026-02-11T17:03:57 1770829437

You know, LLMs could do automated code reviews for each update to avoid things like this. It would be much better than unexamined updates.

herf · 2026-02-01T20:45:52 1769978752

"Noftsker also shared the hacker aversion to cigarette smoke, and would sometimes express his displeasure by shooting a jet of pure oxygen from a canister he kept for that purpose; the astonished smoker would find his or her cigarette bursting into a fierce orange blur."

- Hackers, Steven Levy, 1984