Hacker Newsnew | past | comments | ask | show | jobs | submit | rhogar's commentslogin

This is very close to the thesis, or at least theme, of the essays in The Mythical Man-Month, Fred Brooks. Some elements are dated (1975), but many feel timeless.

Brooks law “Adding manpower to a late software project makes it later” is just the surface of some of the metaphorical language that has most stuck with me: large systems and teams quickening entanglement in tar pits through their struggle against coordination scaling pains, conceptual integrity in design akin to preserving architectural unity of Reims cathedral, roles and limitations attempting to expand surgical teams, etc.

Love a good metaphor, even when its foundation is overextended or out of date. Highly recommend.


Congratulations on the launch! Personally would love to see a rough estimates of the expected number of requests and tokens required to run tasks like synthetic data generation for different amounts of data. Though this is likely highly variable, would like to have a loose idea of possible incurred costs and execution time.


Hey, this is a highly requested feature. We will be implementing it soon. Something like a rough estimate is what we are planning to do.


The report does not detail hardware -- though it states that SDXL has 2.6B parameters in its UNet component, compared to SD 1.4/1.5 with 860M and SD 2.0/2.1 with 865M. So SDXL has roughly 3x more UNet parameters. In January, MosaicML claimed a model comparable to Stable Diffusion V2 could be trained with 79,000 A100-hours in 13 days. Some sort of inference can be made from this information, would be interested to hear someone with more insight here provide more perspective.


wouldnt that mean more vram is required to load the model? they are claiming it will still work on 8 gb cards.


Stable Diffusion 1/2 were made to run on cards with as little as 3GB of memory.

Using the same techniques, yes, this will fit in 8.


I am guessing 8 bit quantization will be a thing for SDXL.

It should be easy(TM) with bitsandbytes, or ML compiler frameworks.


bitsandbytes is only used during training with these models tho (the 8-bit Adamw) quantizing the weights and the activations to a range of 256 values when the model needs to output a range 256 values creates noticeable artifacts as they are not going to map 1-to-1.


Draw Things recently released a 8-bit quantized SD model that has comparable output as the FP16. It does use k-means based LUT and separate weights into blocks to minimize quantization errors.


I was going to search on the internet about it, but then I realized you are the author (and there is nothing online I think). I imagine that the activations are left in FP16 and the weights are converted in FP16 during inference, right?

Btw very cool


Yes, computes are carried out in FP16 (so there is no compute efficiency gains, might be latency reductions due to memory-bandwidth saving). These savings are not realized yet because no custom kernels introduced yet.


Additional discussion from post two days ago: https://news.ycombinator.com/item?id=36443676


Though inference for the 8B model is almost definitely not capable of near real time inference yet, we’re approaching babelfish territory. Main difference perhaps being this is powered by burning massive amounts of carbon as opposed to a fish brain.


> Though inference for the 8B model is almost definitely not capable of near real time inference yet

Google previously showed you could get the fullsized 540b-parameter PaLM-1 model down to "a low-batch-size latency of 29ms per token during generation (with int8 weight quantization)" https://arxiv.org/abs/2211.05102#google . How many tokens per 1000ms do humans speak? I'm guessing fewer than 34. The real question is who wants to pay for it.


Performance on so few examples is impressive, paired with generalizability across broader tasks, and multiple embodiments + environments (and from just visual goals rather than complex verbal instructions) is quite a jump from where we saw Gato at last spring. If representative, seems a strong step toward meaningful autonomous skill acquisition/transference in realistic settings.


Seems an excellent opportunity to cross post recent reporting of insurance companies reevaluating their partnership with GRAIL after more than 400 patients were incorrectly told they may have cancer. Significant benefits, but only with thoughtful implementation. https://news.ycombinator.com/item?id=36176338


The broader context of the target audience of this language is very important here — it’s oriented towards the familiarity and needs of a community most comfortable with python


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: