This "early velocity only" approach seems like a problem - how do you know with 5-minute training runs that you aren't affecting the overall asymptote?
e.g., what if the AI picks a quantizer that happens to be faster in the first five minutes, but has a big noise floor where it can't make more progress?
Yes, it's greedy so may hit local optima. You can fit learning curves and try to extrapolate out to avoid that problem, to let you run long enough to be reasonably sure of a dead end, and periodically revive past candidates to run longer. See past hyperparameter approaches like freeze-thaw https://arxiv.org/abs/1406.3896 .
It's easy to lie to an OS about your age because it's a single-user experience, and if your parents allow you to lie (or don't know), that's all it takes. Social networks are so much better equipped to estimate age because they have a simple double-check, which is that most kids follow other kids in their grade level.
The patches on top of this are really bad. For instance, we are seeing "AI" biometric video detectors with a margin-of-error of 5-7 years (meaning the validation studies say when the AI says you're 23-25 you can be considered 18+), totally inadequate to do the job this new legislation demands.
They teach a lot of Taylor/Maclaurin series in Math classes (and trig functions are sometimes called "CORDIC" which is an old method too) but these are not used much in actual FPUs and libraries. Maybe we should update the curricula so people know better ways.
Taylor series makes a lot more sense in a math class, right? It is straightforward and (just for example), when you are thinking about whether or not a series converges in the limit, why care about the quality of the approximation after a set number of steps?
Not quite. The point of Taylor’s theorem is that the n-th degree Taylor polynomial around a is the best n-th degree polynomial approximation around a. However, it doesn’t say anything about how good of an approximation it is further away from the point a. In fact, in math, when you use Taylor approximation, you don’t usually care about the infinite Taylor series, only the finite component.
Taylor series have a quite different convergence behavior than a general polynomial approximation. Or polynomial fit for that matter. Many papers were written which confuse this.
For example, 1/(x+2) has a pole at x=-2. The Taylor series around 0 will thus not converge for |x|>2. A polynomial approximation for, say, a range 0<x<L, will for all L.
https://arxiv.org/pdf/2310.11453
The original paper [fig 1, bottom-right] seems to say it needs about 4-5x the parameters of a fp16 model. You can build it and run some models, but the selection is limited because it has to be trained from scratch. I imagine inference speed is faster compared with modern PTQ (4- and 8-bit quants) though.
I wish there were more ways to specify whether the Windows filesystem /mnt/c should be mounted in a WSL2 instance - it is kind of generally on or off. In cases where I want WSL2 to function as a "container" isolated from my desktop, I use a different Windows user just in case.
It's sad to see an LLM take over a blog, because you can see the line: before 2026 it's an interesting person you would like to talk to. After 2026, it's like generic LLM marketing-voice copy.
This made me laugh. I agree that's a plus! Auto updates are mostly bad. Look at the state of extensions on vscode. The permissions combined with silent updates is scary
The days you move between categories can establish your birthdate, which is a lot of bits if you are doing this on an individual level (basically it's a great start at a supercookie).
"Noftsker also shared the hacker aversion to cigarette smoke, and would sometimes express his displeasure by shooting a jet of pure oxygen from a canister he kept for that purpose; the astonished smoker would find his or her cigarette bursting into a fierce orange blur."
reply