More

zdyn5 · on Jan 4, 2025

No, CMOS image sensor Bayer RAWs are not accurate to what we actually see. Just a few of the many things that need to be addressed to get an even “normal” looking image:

1) They’re overly green because the greens of the RGGB color filter array are more sensitive to light than the other colors. This needs to be corrected with nontrivial auto-white-balance algorithms (not only the green bias needs to be fixed but other scene-dependent factors)

2) The Bayer pattern of the color filter array creates a checkerboard color pattern that needs to be fixed with debayering/demosaicing algorithms - again nontrivial if you don’t want to create artifacts or overblur with simplistic interpolation approaches, and there are even ML algorithms that do this now.

3) Bayer RAWs are linear in photon intensity which is not accurate to how our visual system compresses the high dynamic range. Therefore, various tone mapping algorithms are required to reproduce a natural-looking tone/intensity map of the scene.

4) Small sensors can’t collect enough light and this inhererently results in noisy raw images that need to be denoised. There are a lot of different denoising approaches (including modern that are basically all ML), but care needs to be taken as this is one of the places where it’s easy to generate an overly-processed image.

There are a lot more steps that happen in a typical image processing pipeline, while yes can be tuned in non-ideal ways to produce overly-processed-feeling images, but are at the end of the day necessary if you don’t want something that looks like these: https://images.app.goo.gl/neJCHk5QsVt68XpL7

crazygringo · on Jan 4, 2025

But all that is just tuning, and demosaicing is only non-trivial if you want to "fake" a higher resolution -- otherwise just downsample. Same with denoising.

The main point is actually what you say in #3 -- that they are linear. That's what I mean by being accurate.

Compressing the dynamic range is an artistic choice that will not make an image look like what our eyes see. Film, for example, is more tolerant to overexposure because it has a nonlinear response -- but that's objectively incorrect. Our eyes already compress the dynamic range in a kind of ~logarithmic way. So you don't want to do that on image data, or it ends up being done twice! Which is precisely why bad HDR images can look so fake. Artistically you'll always have to deal with whether you want to compress dynamic range and how.

But my main point stands -- the data coming from CMOS is objectively accurate in a way that film is not. It's linear. It's not compressing range at the top or anything like that, or oversaturating certain colors, etc.

zdyn5 · on Nov 17, 2024

Can’t this cynical take be used to nullify any and all journalism? I get the need to be suspicious generally but this comment doesn’t add to the conversation in any substantive manner. At least give us some take on how the author and her content would be likely biased given this commercial influence.

osigurdson · on Nov 18, 2024

It does add to the conversation in my view. This content merely feeds into the fears that people have regarding AI without adding any new information.

Suggesting that this this prediction oriented material might be click bait is not an unreasonable position. My comment adds to the conversation by pointing this out.

zdyn5 · on Aug 24, 2024

Sorry I had misread your line - somehow the “not” didn’t register. Can’t remove/edit my previous comment for some reason.

zdyn5 · on Aug 24, 2024

Little doubt? By tautological reasoning sure…

zdyn5 · on June 27, 2024

I know it’s probably using < 1000x compute of the real Sora, but “pretty good” is stretching it

loudmax · on June 27, 2024

Depends on your frame of reference. Compared to anything else I've seen generated on a consumer grade GPU, I'd say these are are indeed pretty good.

Here's their example gallery: https://hpcaitech.github.io/Open-Sora/

Compared to the outputs from other models run on consumer grade GPUs, I'd say those are very good.

yreg · on June 27, 2024

Looking at it in low res (small rectangle within the webpage on mobile) they actually look great!

maxerickson · on June 27, 2024

What's the useful frame of reference?

Looking better than other things that are also bad is sort interesting in that it represents progress in some direction, but it isn't very interesting to people outside of the topic.

clwg · on June 27, 2024

For me I use the Will Smith video[0] from just over a year ago. Compared to the examples it's a pretty stark difference.

https://arstechnica.com/information-technology/2023/03/yes-v...

Lockal · on June 28, 2024

Yes, people learned not to generate other people eating. Current SOTA models still have no concept of walking (left leg, right leg, left leg, right leg; it's so complicated?), there is no reason to believe that they have learned the peculiarities of food consumption.

roenxi · on June 27, 2024

We seem to be in an exponential uptick phase of tech driven by hardware improvements; a few years ago this was impossible on consumer grade GPU. So in some sense there isn't a useful frame of reference, state of the art should improve out-of-sight about every 2 years and eventually I'd expect iPhones to be outgenerating Disney at movies.

forkerenok · on June 27, 2024

Not GP, but when I looked at the examples, I thought that those already look pretty useable in comic book-like storytelling to set the mood. I.e. in settings where smaller details of the scene are not relevant and are not taking away from the "larger product".

Flumio · on June 27, 2024

Good that this frame of reference is hn and not some random website where people have no connection to ml...

thriftwy · on June 27, 2024

Just run all key frames through stable diffusion and it should be quite good.

uh_uh · on June 27, 2024

I don't think there's such a thing as key frames here, just frames. And if you run SD through every frame, the output will be janky because SD doesn't know about temporal coherence.

thriftwy · on June 27, 2024

As soon as it's an MP4 it will have key frames all right. You could add AI upscaling to your encoder. People are making fun of "Just", but I believe I could take apart ffmpeg to add this feature (PoC) in two weeks or less. Provide somebody pays for my labor and for the HW.

ehsankia · on June 27, 2024

Honestly neither does OpenSora it seems, as it is pretty damn janky already.

mejutoco · on June 27, 2024

You can pass a few frames as a single image grid. Then you will get coherence, although it will be very limited by gpu ram.

resource_waste · on June 27, 2024

>Just

Adding the word 'just' doesnt make it any easier. Something I've noticed is that people who have never done something themselves and are telling someone to do an difficult task, will use:

"Just"

in-front of it.

This is particularly relevant in tech.

tetris11 · on June 27, 2024

Depends if it's already been done before, in which case "just" would then have been just used quite justly.

PhoenixFlame101 · on June 27, 2024

It's very likely the comment you replied to said it in a joking sense

sertraline · on June 27, 2024

It is extremely difficult to tell if the person is joking in a field full of people who think AI is some sort of magic.

nineteen999 · on June 27, 2024

Well I mean calling any of this diffusion/LLM stuff "AI" is a misnomer to begin with.

zdyn5 · on June 13, 2024

H100s are far from consumer video cards

stygiansonic · on June 13, 2024

Yeah, ops comment makes it seem like they are building racks of RTX 4090s, when this isn’t remotely true. Tensor Core performance is far different on the data center class devices vs consumer ones.

mike_d · on June 13, 2024

They are building racks of 4090s. Nobody can get H100s in any reasonable volume.

Hell, Microsoft is renting GPUs from Oracle Cloud to get enough capacity to run Bing.

TeMPOraL · on June 13, 2024

There are apparently some 400 of H100s sitting idle somewhere upthread. Yes, I'm having hard time imagining how's that possible too.

kkielhofner · on June 13, 2024

Who is "they"?

RTX 4090s are terrible for this task. Off the top of my head:

- VRAM (obviously). Isn't that where the racks come in? Not really. Nvidia famously removed something as basic as NVLink between two cards from the 3090 to the 4090. When it comes to bandwidth between cards (crucial) even 16 lanes of PCIe 4 isn't fast enough. When you start talking about "racks" unless you're running on server grade CPUs (contributing to cost vs power vs density vs perf) you're not going to have nearly enough PCIe lanes to get very far. Even P2P over PCIe requires a hack geohot developed[0] and needless to say that's umm, less than confidence inspiring for what you would lay out ($$$) in terms of hardware, space, cooling, and power. The lack of ECC is a real issue as well.

- Form factor. Remember PCIe lanes, etc? The RTX 4090 is a ~three slot beast when using air cooling and needless to say rigging up something like the dual slot water cooled 4090s I have at scale is another challenge altogether... How are people going to wire this up? What do the enclosures/racks/etc look like? This isn't like crypto mining where cheap 1x PCIe risers can be used without dramatically limiting performance to the point of useless.

- Performance. As grandparent comment noted 4090s are not designed for this workload. In typical usage for training I see them as 10-20% faster than an RTX 3090 at much higher cost. Compared to my H100 with SXM it's ridiculously slow.

- Market segmentation. Nvidia really knows what they're doing here... There are all kinds of limitations you run into with how the hardware is designed (like Tensor Core performance for inference especially).

- Issues at scale. Look at the Meta post - their biggest issues are things that are dramatically worse with consumer cards like the RTX 4090, especially when you're running with some kind of goofy PCIe cabling issue (like risers).

- Power. No matter what power limiting you employ an RTX 4090 is pretty bad for power/performance ratio. The card isn't fundamentally designed for these tasks - it's designed to run screaming for a few hours a day so gamers can push as many FPS at high res as possible. Training, inference, etc is a different beast and the performance vs power ratio for these tasks is terrible compared to A/H100. Now lets talk about the physical cabling, PSU, etc issues. Yes miners had hacks for this as well but it's yet another issue.

- Fan design. There isn't a single "blower" style RTX 4090 on the market. There was a dual-slot RTX 3090 at one point (I have a bunch of them) but Nvidia made Gigabyte pull them from the market because people were using them for this. Figuring out some kind of air-cooling setup with the fan and cooling design of the available RTX 4090 cards sounds like a complete nightmare...

- Licensing issues. Again, laying out the $$$ for this with a deployment that almost certainly violates the Nvidia EULA is a risky investment.

Three RTX 4090s (at 9 slots) to get "only" 72GB of VRAM, talking over PCIe, using 48 PCIe lanes, multi-node over sloooow ethernet (hitting CPU - slower and yet more power), using what likely ends up at ~900 watts (power limited) for significantly reduced throughput and less VRAM is ridiculous. Scaling the kind of ethernet you need for this (100 gig) comes at a very high per-port cost and due to all of these issues the performance would still be terrible.

I'm all for creativity but deploying "racks" of 4090s for AI tasks is (frankly) flat-out stupid.

[0] - https://github.com/tinygrad/open-gpu-kernel-modules

michaelt · on June 13, 2024

> The RTX 4090 is a ~three slot beast when using air cooling and needless to say rigging up something like the dual slot water cooled 4090s I have at scale is another challenge altogether... How are people going to wire this up? What do the enclosures/racks/etc look like?

A few years ago, if you wanted a lot of GPU power you would buy something like [1] - a 4/5U server with space for ten dual-slot PCIe x16 cards and quadruple power supplies for 2000W of fully redundant power. And not a PCIe riser in sight.

I share your scepticism about whether it's common to run >2 4090s because nvidia have indeed sought to make it difficult.

But if there was some sort of supply chain issue that meant you had to, and you had plenty of cash to make it happen? It could probably be done.

Some of the more value-oriented GPU cloud suppliers like RunPod offer servers with multiple 4090s and I assume those do something along these lines. With 21 slots in the backplane, you could probably fit 6 air-cooled three-slot GPUs, even if you weren't resorting to water cooling.

[1] https://www.supermicro.com/en/products/system/4U/4028/SYS-40...

mike_d · on June 13, 2024

> but deploying "racks" of 4090s for AI tasks is (frankly) flat-out stupid.

You seem to be trapped in the delusion that this was anyone's first, second, or third choice.

There is workload demand, you can't get H100s, and if you don't start racking up the cards you can get the company will replace you with someone less opinionated.

zdyn5 · on March 16, 2024

Not if they eject the trash above escape velocity

smallmancontrov · on March 16, 2024

Charge up the electromagnetic trash cannon!

greggsy · on March 16, 2024

You’d presumably need some kind of trash compactor to maximise the efficiency.

woleium · on March 16, 2024

Anything is a weapon if its going fast enough

KineticLensman · on March 16, 2024

This presumably was Humanity’s fall back if the fusion drives didn’t stop the Kzinti

brookst · on March 16, 2024

Aka the trash to kinetic weapon converter.

zdyn5 · on Feb 26, 2024

From a high-level design standpoint, wouldn’t the general-purposeness of NVIDIA’s GPUs (even if they do have some AI/LLM optimizations) put them generally at a disadvantage compared to more custom/dedicated inference designs? (Disregarding real-world issues like startup execution risks, assume competitors succeed at their engineering goals) Or is there some fundamental architectural reason why NVIDIA can/will always be highly competitive in AI inference? Is the general-purposeness of the GPU not as much of an overhead/disadvantage as it seems?

Also how critical is NVIDIA’s infiniband networking advantage when it comes to inference workloads?

p1esk · on Feb 26, 2024

Custom chips have to be much better than Nvidia to become attractive. Being 2x faster won’t be enough, 5x faster might be. Assuming perfectly functioning software.

zdyn5 · on Feb 26, 2024

Is software that important on the inference side, assuming all the key ops are supported by the compiler? Once the model is quantized and frozen the deployment to alternative chips while somewhat cumbersome hasn’t been too challenging, at least in my experience with Qualcomm NPU deployment (trained on NVIDIA)

p1esk · on Feb 26, 2024

Let me put it this way: if there’s even a slightest issue with my Pytorch code (training or inference) running on a non Nvidia chip it will be an automatic no from me. More than that - if I simply suspect there will be any issues I will not try it. Regardless of any promised speedups.

Whoever wants to sell me their chip better do an amazing demo of their flawless software integration.

tester756 · on Feb 26, 2024

What an approach.

It is very simple math

If the savings on hw/compute are greater than cost of adjustments, then it is probably worth it.

So, if you prefer to avoid spending e.g 1 month on adjusting and testing just to keep using e.g 1.x more expensive hw, then it is your loss in long run

p1esk · on Feb 27, 2024

The problem is I don’t know how much time it will take to make it work, or if it’s even possible to make it work for my specific situation.

I’ve wasted enough of my time trying to debug AMD and Graphcore chips to fall into this trap again.

zdyn5 · on Jan 27, 2024

Naive question: how is brute-force cracking still a thing in real-world systems? Aren’t there time-outs/bans for guessing wrong after like 3-5 guesses? How does one get the opportunity to try millions/billions/etc of times?

tetha · on Jan 27, 2024

Offline vs online brute forcing, as I like to call it.

As others have said, if you have the hashes, you can brute force them offline and there won't be any limits on how fast it can go besides your algorithms and compute resources.

But even online, attackers can be pretty smart. For example, something we detected was an attacker rotating both through a bunch of accounts and a bunch of IP addresses. That way you never saw many incorrect login tries per account and IP in a timeframe. It's not millions/billions of tries, but it can get around naive limits per IP or per account and you need some SIEM tooling to detect that.

stavros · on Jan 27, 2024

Modern KDF algorithms are designed to guard against offline attacks by massively increasing the cost per hash. Online or offline, brute forcing shouldn't be an issue nowadays.

Saying "there's no limit besides your resources" is basically saying "there's no limit besides the very real and insurmountable limit there is".

tetha · on Jan 27, 2024

Yeah, I fell into my usual security questionaire wording there.

I'm not even contradicting you there. You can go as fast as you can go. Even if every atom in the current estimation of the universe had a couple thousand computations available, we couldn't brute-force some passwords. Except, now customer security asks you "but what about millions of computations per atom? Checkmate!".

Being too concrete and absolute with these kinds of people ends up with so many stupid discussions.

stavros · on Jan 27, 2024

> You can go as fast as you can go.

This is true, it's just that, with modern KDFs, that's still too slow to matter (unless someone broke them and we don't know). If you use a modern KDF, you basically don't have to worry about brute forcing at all, even for fairly weak passwords.

tetha · on Jan 27, 2024

I know that. You're missing the second part there.

I have been asked by customers about the reliability of our software platform if major german cities have been hit with either nuclear, natural or military disaster. It's that level of silly you sometimes have to deal with.

Eventually I got fed up enough and told those kinda people that I'm volunteering in disaster prevention services and their systems wouldn't be my problem at that point.

stavros · on Jan 27, 2024

Huh, I didn't know people wanted that level of disaster planning.

PrimeMcFly · on Jan 27, 2024

Guess how many systems are using KDF algorithms in practice?

TillE · on Jan 27, 2024

Probably the vast majority of important systems. PBKDF2 has been around forever and is in very widespread use.

stavros · on Jan 27, 2024

The fact that they aren't implementing the solution doesn't mean that the solution doesn't exist or isn't effective, though.

Plus, nowadays, most (all?) big frameworks have used KDFs by default for years.

PrimeMcFly · on Jan 27, 2024

True. I don't think Windows or Linux do though, right?

stavros · on Jan 27, 2024

Linux uses bcrypt by default, AFAIK. Windows had NTLM last I looked, but I don't know what they have now.

zoolily · on Jan 27, 2024

Ubuntu Linux used to use a SHA2 hash repeated 5000 times, but my Ubuntu 22.04 system uses yescrypt, which is one of those KDFs.

PrimeMcFly · on Jan 27, 2024

nice!

Ellipsis753 · on Jan 27, 2024

This is for cracking password _hashes_. Most websites won't store a user's plain-text password but will only store the hash of it. Then a hack/exploit might later reveal the website's password hashes. This program helps you turn the hash back into the original password. Assuming you have a hash already, you own the hash, so it's not possible for anyone to impose a rate limit on how quickly you can attempt to break it.

Etheryte · on Jan 27, 2024

Databases get dumped, well, not all the time, but fairly often. See haveibeenpwned for example, they post a new breach once a week, if not more often [0].

[0] https://feeds.feedburner.com/HaveIBeenPwnedLatestBreaches

fullspectrumdev · on Jan 27, 2024

HIBP even is basically tip of the iceberg in terms of how much data is floating around - Troy and his team only get the ones that are publicly leaked, or privately shared with them.

They also have historically had a backlog of data to process - leaked databases can be a pain to parse and turn into something usable.

starmilk · on Jan 27, 2024

What you generally feed into password cracking software is hashes of passwords that you've found by listening on the network, dumping from memory, or obtained by chaining another vulnerability.

These are in a text file locally (offline), so there is no system that you are submitting hashes to for verification. It simply tries md5(your_password_guess) until it computes the same hash that you supplied.

This is oversimplified and you can replace md5 with any hash alg that you need, but i hope it makes it clear that guesses don't happen against the auth server.

kemotep · on Jan 27, 2024

Well if not setup properly, it is possible to dump the Windows password hashes (and linux too).

You take that list of hashes, and copy to your password cracking rig, where it can run for a few days to see how many password hashes you can find a match for. Then once you have identified a password hash match, you now know an account password.

However, if things aren’t properly secured where an attacker can dump password hashes, they likely can utilize “pass the hash” style attacks as well where you don’t even need to know the password to be able to sign in as a user.

px43 · on Jan 27, 2024

Windows networks are notoriously bad about this. If you find yourself on a Windows network, either because you found an active ethernet jack in the lobby, or you get on the wifi, phishing, or you land on a citrix box or whatever, you can run a tool called Responder.

Windows machines on a network are constantly scanning around, looking for new devices, and when they find them, they like to see if they can access them so they show up in network manager or whatever. They do this by trying to log in. Obviously logging in with a password would be insecure, so they try to log in with a hash. Responder pretends to be any sort of server that a Windows machine would try to log in to, so right when you run it, all the nearby machines hand over their hashes.

Crack even one of those hashes, and now you can log in to Active Directory. This will let you get the full list of all users, permissions, groups, machines, and sessions, etc, and basically tell you exactly what you need to do to get anywhere you want (Bloodhound is the main tool people use for this).

That AD account also lets you dump all the SPNs (service accounts) on the network, and because Windows is Windows, of course that gives you something like 20-30 password hashes, many of which are almost certainly Domain Admins on the network.

Crack a Domain Admin account, and you can basically do whatever you want on the network, including doing a dcsync, which is normally used to back up a domain controller, but also dumps every account and NTLM hash straight into your lap. These hashes can be used with pass-the-hash to impersonate any account, or you can just crack them and basically have free access to the network for the rest of your life.

The entire security of Windows networks is based on the premise that password crackers don't exist, which is why they have been fundamentally fucked for decades, and there's zero chance that any of this will ever get fixed.

coffeeri · on Jan 27, 2024

You don't need to do this online in many cases. For instance, the hash of WPA-secured Wi-Fi networks can be captured during the handshake of other devices.

einarfd · on Jan 27, 2024

If you manage to steal the file with the hashed passwords. Then none of that makes a difference.

zdyn5 · on Jan 27, 2024

Thanks everyone for these informative answers!

zdyn5 · on Jan 19, 2024

Yes per day, it’s still a minute percentage

clipsy · on Jan 19, 2024

If a child is on facebook for 5 years (the max possible without violating age requirements and assuming the age of majority is 18) and if harassment events are independently distributed, the chance of this particular child suffering sexual harassment is (at least) 1 - (1 - 0.00005)^(365 * 5) = ~8.7%.

Moreover, given that the 13-17 (inclusive) age range represents a small portion of FB's total user base, the 0.005% figure is a deep underestimate itself, meaning the 8.7% is as well.

Everyone is free to their opinion, of course, but "only 8.7% of children on FB are sexually harassed" seems a bit cavalier.

spencerchubb · on Jan 19, 2024

It's better than the rate of child sexual harrassment in real life. So facebook beats real life on this metric.

FeepingCreature · on Jan 19, 2024

Of course, there's no reason to assume it's independently distributed.