Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Stable Diffusion PR optimizes VRAM, generate 576x1280 images with 6 GB VRAM (github.com/basujindal)
192 points by avocado2 on Sept 4, 2022 | hide | past | favorite | 143 comments


For everyone about to comment on the garbage in the commit:

It looks like the committer made their changes in the top commit, then merged the updated CompViz StableDiffusion change set on top of it for some reason. That's where the license change, rick astley image, etc come from.

And yes, StableDiffusion from the original repo will rick roll you if you try to generate something that triggers its NSFW filter.

Here's the code that does it:

https://github.com/CompVis/stable-diffusion/blob/main/script...

And here's what it looks like:

https://twitter.com/qDot/status/1565076751465648128


> And yes, StableDiffusion from the original repo will rick roll you if you try to generate something that triggers its NSFW filter.

It goes without saying that the authors of a piece of software have the right to make the software do whatever they want, but that shouldn't stop us from recognizing that AI engineers are starting to act like megalomaniac overseers who consider it part of their mission to steer humanity onto the "right" path.

Who exactly do these people think they are?

Imagine this behavior from a web browser. "The URL of the file you were trying to download triggered my NSFW classifier, so I'm going to replace the file with this funny image."

This isn't funny, it's creepy.


Nah it’s just basic due diligence for releasing an open source app of this nature. Turning it off is a simple one line change, because everything is obviously named. This is not some high wall to scale.

As someone that manages nsfw open source projects, this move seems fine to me.

And actually kinda hilarious.


If what StableDiffusion did was ask the user something like "The prompt you entered may result in the generation of content some people find objectionable. Are you sure you want to proceed?", then I would buy your argument.

As currently implemented (and the implementation took more work than a confirmation prompt would have!), it's an obvious attempt to control, rather than protect, users. They try their best to dress it up like a joke, but it clearly isn't.


They aren’t trying to “control” users, they’re trying to minimize the amount of bad press these AIs get. The approach they took was easier than your suggestion, because their system detects NSFW content on a per output image basis… so it’s easier to output some image (rickroll) rather than put up a prompt or even skip it completely (since they also generate a grid of the resulting images).

I should add that it’s also incredibly easy to comment it out.


Yeah, it's basic survival. Technology is inherently free and driven by porn, but sometimes that can be shocking to those uninitiated, so to "protect the children", restrictions are often placed to avoid alerting the horde and reverse progresses. I think those "enlightened" should construct and share such understanding that these are the actual reasons, so to avoid us all ending up in an actual dystopia where "protections" are enforced stricter than intended.


The content it can produce is illegal in many countries. It's not something they just decided to do one day on their own.


The implementation as it stands is actually easier the way they do it. They replace the image by scaling Rick to the image size. That means that they can still automatically put your output images in a grid without any extra code. They could have zeroed the pixels, but that could have led to confusion about if things were working or not.

It's silly and you're able to patch it out if you so desire. But most importantly, they released the model to everyone so it's way more open than OpenAI's DALL-E 2 (for better or worse). I don't think the argument that they're trying to control you really makes a lot of sense given how widely accessible this model is.


Absolutely. The limit isn't in what you ask for, it's in what's returned. The whole point is to try and stop the user being shown nsfw images despite sfw prompts, and since you can generate multiple results per prompt adding an interactive check on each one doesn't make sense.


That would absolutely be more work than just subbing in an image and would neither have worked in the original implementation nor the various frontends it's been used in.

This is quite a classic HN kind of comment. Immediately assumes specific problematic intent and proposes a solution that doesn't fit the API.


Now instead users have to spend hours upon hours editing docker images to remove it. Much easier.


That's neither true not would it have been helped by making the original code require input during generation.

They put a simple check in, that tries to avoid returning nsfw images so that you don't get that back despite a more 'innocent' prompt. It's trivial to remove and is only part of the demonstration scripts, it is not part of the model or anything fundamental to it's workings.


It's open-source. Patch it out yourself. It's not hidden.


That's not the point at all.


It's the point of open source.


You have no point whatsoever. You're fighting windmills.


I disagree that the rick roll is inappropriate, but i also hate the argument that just because its open source, the programmers are above criticism. I think its reasonable people can dislike or even morally object to the decisions open source maintainers make (even if i dont particularly object to this specific instance)


They're not above criticism, but what they've done is a very small layer that's easily removed. You can do it, and release a version without it, in minutes. You can do it in less time than it takes to write a long comment decrying it.

It's not fundamental to SD in any way, and their suggestion wouldn't have worked.


You know you can download the code yourself, right? You need not be reliant on their web interface if you're having these control issues.

It's also clearly in place because they don't have an age disclaimer, which would likely open them up to some sort of liability (or additional liability). Again, it's open source, and freely available, so like. . . this feels like a you problem, sorry.


As currently implemented, it’s a hilarious Easter egg. I wouldn’t read too much into it.


From your previous comment:

> AI engineers starting to act like megalomaniac overseers who consider it part of their mission to steer humanity onto the "right" path.

You outdid your comment. You have an idea of how this should work. And you're trying to supposedly "steer" in a path of humanity. It's just projection.

This kind of preachery righteousness control tactics should sit elsewhere (fork it). The devs want to convey something is nsfw. They will do it however they want.


I don’t read it that way. I think their intent is open to interpretation.


Fork it and fix it, see if your branch gains traction.


What would the average 11 year old answer?


Yeah, but, when it comes to generating NSFW contents, it is illegal to distribute pornographic software in some countries, and doing so can lead to domain blockage and such (i.e. South Korea, believe or not). This is something you gotta understand.

Also, if the author wanted to block NSFW contents at all, I'm pretty sure one can actually make the filter inseparable from the main network. This isn't the case here AFAIK.


That makes no sense. You can draw whatever you like with Photoshop. You can search for whatever you want on Google. That doesn't make the software "NSFW", and no country will block Photoshop's domain just because someone used Photoshop to create NSFW content.

Don't let people who are clearly motivated by a moralistic desire to control others hide behind BS pseudo-legal excuses.


1. Google do have a stuff called "SafeSearch"[1] to filter explicit contents, and they do have a DMCA-like system for processing violation reports from governments and legal entities. That is, Google is already cooperating.

2. Photoshop's sole function is to let people draw, so drawing a porn w/ Photoshop is 100% users' responsibility. In case of image generators, the AI itself has the capabilities to create NSFW contents. If you're talking about the interactive nature of "prompt", that logic also makes any adult games SFW - they are safe until the user clicks something.

[1]: https://support.google.com/websearch/answer/510


You'd be right about Photoshop, but...

https://helpx.adobe.com/photoshop/cds.html


Explain this to insert regime when they block your GitHub repo or make using your software a felony. You are proposing to act on ideology, while such mitigations are rooted in pragmatism. You often can’t have both.


SK and Germany are notorious for that. They don't care how the pixel arrays came to existence, any shapes and forms and representations, even links to such information, that are interpretable as belonging to certain categories are plain illegal there. I believe SK don't even recognize copyright laws for such "illegal" information.


Yes, the megalomania of someone who makes their code and models freely available.


StableDiffusion isn't freely available, in the "Free Software" sense. They use the highly uncommon "CreativeML Open RAIL-M License" which is a wall of text composed of weasel words describing how the software is so incredibly advanced and dangerous that despite the authors' earnest wish to do so, they cannot in good conscience make it genuinely Free Software.

These people wrote a bunch of Python code that pipes images and text through a GPU, and they're acting as if they had created a secret weapon that somehow humanity must be protected from. If that's not megalomania, I don't know what is.


If they think it's a secret weapon humanity must be prevented from using, giving it away for free with examples on how to use it with a note on the side saying "pls don't misuse thanks" seems like a very odd thing to do.


Oh please, all the creators of these image AIs (OpenAI, Google, Midjourney, SD, etc) are being very very cautious with this stuff. It’s not going to end humanity, but it could easily lead to some really gross content that would paint the responsible organization in a bad light.


Can you elaborate? Would Adobe be painted in a bad light if someone committed a crime using Photoshop?

Honestly it just feels weird. I've read the license restrictions and I can't see why they are there out what they're preventing.


Photoshop doesn't have an automatic "Generate image of Obama selling weapons to Putin" function.


But anyone with enough time to learn (or money) can do it anyways without much effort.

(I'm trying to understand the exact difference here.) So it boils down to democratization? That anyone can do it regardless of skills acquired?


It boils down to who's doing it. An artist in photoshop the artist is responsible. An AI asked to do it likely the people that made the AI is responsible (until AI can think for itself).

Take it out of drawing. If you write a program to control elevators and it breaks the elevators aren't you, the person that wrote the program, responsible? Why would Adobe be responsible for something someone else draws?

Or let's take an easier more close example. If you just made a character generator. Here's one

https://www.numuki.com/game/mii-creator/

And there was a "randomize" button that one out of 100% made a very pornographic image. Who would get the blame? The person that pushed the button or the person that created the project?


Perhaps it's the use of the AI moniker but this thing is a computer program under the control of a human, who is the person responsible. It takes time and effort to use and is very much not like automated elevator control. In order of difficulty:

1) figuring out how to rejig the prompt to get what you'd like, adjusting seeds and tuning configuration options. The more control you want, the more complex and manual the pipeline of backing software will be. All it does is amplify everyone's innate artistic talent.

2) Coming up with a good prompt. This relies on the person's imagination, facility with words and familiarity of limitations of the image generating software and its training datasets.

3) Selection. This can be a frustrating experience that tries one's patience.

> made a very pornographic image. Who would get the blame?

You would, it's not like the software automatically distributes all its generations. The vast majority of images these software generate are not good enough to be shared and aren't. You made the conscious decision to share it.

Even if it were an AGI, you would be responsible. It's very much possible to commission a ninja from a human artist and get something very pornographic on a famous celebrity and you would be held responsible for choosing to share it, you had the choice not to.


True, they are all very cautious not to let bad actors generate bad content.

Where "bad actors" are defined as "people who disagree with us", and "bad content" is defined as "things we don't want to see".

Needless to say, the list of bad actors never includes the authors themselves, and the list of unacceptable applications never includes anything the authors had in mind.


They are not cautious at all to prevent bad actors. It is very very easy to bypass their filters. They are just doing the basic due diligence to make sure that the average casual user doesn’t make something that grosses themself out.


You may be right in the larger sense, but the word you're looking for is conceited, maybe.

Megalomania: obsession with the exercise of power, especially in the domination of others


Devil's advocate, this output could have unexpected results, making a filter possibly desirable. Browsers do filter by default with safe browsing URL lists for malware, and they gate access to sites with bad certificates. But agreed users should have control, leaving such filters as safe defaults only.


> Browsers do filter by default with safe browsing URL lists for malware

Yes, for malware. As in, software created by criminals to damage your computer and/or steal your personal information.

That's not even remotely comparable to what StableDiffusion does here. What they do is refuse to generate content you requested, based on opaque criteria that no doubt are ultimately influenced by quasi-religious sentiments retained from the Bronze Age.


The concept is quite similar to the NSFW filter found on Google and other image searches for many years. Like the “mature” search settings, this filter is active by default but simple to disable.


> Who exactly do these people think they are?

Software engineers who took the time to think about the ways that their work could be used.

How many times have we seen on HN people pleading with developers to think about the ethical dimensions of their work?


What's creepy is the very real world that we're approaching where anyone can create porn of anyone else given a few pictures without their consent. That can cause all sorts of lasting psychological harm and other societal negative consequences. Even if this probably can't be entirely avoided, that doesn't mean that the folks developing these models are obligated to make it easy or are incorrect to take steps to avoid it.


> What's creepy is the very real world that we're approaching where anyone can create porn of anyone else given a few pictures without their consent.

Why is that a problem, exactly?

> That can cause all sorts of lasting psychological harm and other societal negative consequences.

Like what? To me this sounds like a problem with how some people want to have absolute control over what others do and want everyone else to play along. That's what I find really creepy.

Someone wants to generate porn of me fucking a donkey? Let them. They already could do that with ms-paint and I don't get magically harmed if they do.

This reminds me of the old saying: "Sticks and stones may break my bones, but AI-generated/photoshopped/glued-together pictures of me fucking a donkey will never hurt me" :)


Revenge porn, rumors, and defamation are already huge problems in society. Being able to generate such things on demand with AI models will make it significantly worse.

https://www.theguardian.com/lifeandstyle/2018/may/07/many-re...


To play devil's advocate being able to generate this kind of content on demand with AI may actually make things significantly worse.

The reason why revenge porn, rumours, and defamation are all possible online is because there's a level of plausibility to these things -- especially revenge porn.

Now that plausibility goes out the window because of AI generated content. Someone posts your actual nudes? Give a nervous laugh and say that AI made them.

Someone says they heard a rumour about you and asks if it's true? Say 'don't believe everything you read on the internet, you know they have AI that writes anything you want now, right?"


Perhaps, but there comes a point where the technology is out there whether we like it or not. We should criminalize bad uses, not engage in some futile attempt to try to stuff the proverbial cat back in the bag


Don't let perfection be the enemy of good. Anyone with a basic machine shop can modify an AR15 for fully automatic fire, and a bump stock can be fabricated even easier. That doesn't mean we should just give up and make these things legal and easily accessible to everyone.


An automatic rifle has almost no positive use to it. This software is not like that and more like something that makes everyone better at a skill, uniformly raising the artistic talent floor. The large majority benefit but some would use their new earned skill for bad. It would be like denying literacy and the printing press because there will always be some people who use it to write very bad distressing things, manipulate more easily en masse and spread misinformation.


They're not refusing to distribute the software, they're restricting it for NSFW uses, of which the danger is more significant than any positive uses - much like automatic fire.


"restricting it for NSFW", according to whose standard? "which the danger", like what?

I'm glad this technology is finally out in the open where people like you no longer have a say in how it is used. The sooner people accept its existence (like they have with Photoshop), the better off and healthier we'll be as a society.


>Why is that a problem, exactly?

Do you really need someone to spell out to you why being able to create realistic-looking porn of anyone might lead to some issues...?

>Someone wants to generate porn of me fucking a donkey? Let them. They already could do that with ms-paint and I don't get magically harmed if they do.

What an absolutely wild take. Do you expect this line of thinking to be convincing when you're comparing AI-generated images to what someone can whip together in MS Paint? Come on. You, me, and everyone else reading here knows that's not even close to a valid comparison.


> Do you really need someone to spell out to you why being able to create realistic-looking porn of anyone might lead to some issues...?

Yes. I honestly have no

> What an absolutely wild take. Do you expect this line of thinking to be convincing when you're comparing AI-generated images to what someone can whip together in MS Paint? Come on. You, me, and everyone else reading here knows that's not even close to a valid comparison.

Wild or not, my point stands. With minimal skills, you can photoshop anyone's face on any pornographic image out there. It's a spot on comparison.


Do you really need someone to spell out to you why being able to create realistic-looking porn of anyone might lead to some issues...?

A less creative next generation as they don't have to close there eyes and imagine generated porn of anyone.

The second problem will be finding people that you would even want generated porn for.


That proverb isn’t true and stems from a generation that didn’t understand mental health


The proverb describes a state of mind one should aspire to. "Words will never hurt me" not because saying mean things can never do harm (it obviously can), but because I make an effort to control my emotions and reactions. Just as physical health is improved by a nutritious diet and regular exercise, mental health is improved by proper habits of mind.


Refusing to acknowledge the affect of hurtful words on yourself, is pretty much the opposite of "proper habits of the mind".

Denial does not make healthy people.


It's not denial so much as it is not caring to the point where you get upset. I think that's the strength of mind that GP is referring to.


I agree. Just because we have more advanced diagnosis and detection tools these days doesn't mean everything has to be a problem. The internet has really shaken up social norms and signaling and the kids are the first ones to enter the new world. I believe once we have a generation of fully formed 30-50 year olds who grew up with this strange social media, then we'll have the proper understandings and social knowledge to handle it well. Kinda like how parents these days can attempt to relate to their kids in high school. We need the elder wisdom there.

But I do believe that right now we're kinda off track. We almost venerate the act of being hurt. Everyone likes attention and nothing gets such protection by certain classes as having been offended or wronged by some other class. Social signals are currently built to display virtue and so people will go out of their way to display their support of the wronged. I _do_ believe that this is the correct direction to move from where we were, but I think it's gone a little too far and needs time to rebound.

Being a victim is the fastest way to go from zero to hero (reach millions of people) these days and it's also seemingly the least likely way to backfire. People are much more hesitant to bring up the wrongs of someone who's currently being defended for fear of ending up being placed in the out group and ostracized from the signaling group.


This all contributes to a social custom of looking to be wronged, so that you may point out the evil of the person/generation/world. However, people aren't idiots and can _sometimes_ tell if you're just faking for clout. This leads to a feedback loop of needing to be really hurt, seeing other people be really hurt, believing your really hurt, and finally internalizing that pain and trauma. Things do hurt, bad words are called bad for a reason. Society might be better if we were all nice. But every bad word and every microaggression does not need to become such a large roadblock to personal freedom. People are chaining themselves to the road with this stuff. Dieing on hills that require them to have been personally wronged, using their own pain as a way to shut down criticism. Yes, people hurt, and things can be bad. But it's also entirely possible to see something hurtful and continue life without it hurting you. It's 100% doable to actually not be hurt, not just ignore it, but to construct a self esteem and understanding that allows you to not be shanked by every half difficult social interaction that occurs.

Mental health awareness is good. But social signals lead individuals to believing they must be hurt to be a part of the in group. Virtue through suffering is an incredibly effective signaling mechanism


Im not saying you should broadcast your woes to the world for fake internet points. Just that people should be ok with having a bad day sometimes and that people who think they are too "strong" to ever be sad, usually are the most screwed up mentally.


I would suggest that anyone who cares so little that there is literally nobody who can say anything that hurts them is basically dead inside.

Part of living is opening yourself up to people. That includes the risk of getting hurt. That's normal and part of the human experience.


I don't think anyone in this thread is denying this.


It is almost like a human able to imagine. If you think human mind is creepy because it can create any thought and start worrying about some psychological harm then you have created a false prison within your own mind where scary thoughts will creep in.


Deepfakes have already existed for several years, and let you paste any face onto any video, in a much more convincing manner than Stable Diffusion, which is very limited in video2video domains.


Photoshop doesn’t let you work on an image of a dollar bill.

I would argue the authors went out of their way to actually make it very easy to “decensor” their model in a way that lets them wash their hands of things. So regardless of the outrage over their Puritanism, they actually have made it easier to make pornography than ever before. I don’t think applying the tiny speedbump of their model being censored by default is unreasonable in the interests of conservativism.

In the long run these men will go down in the history of pornography.


Read the tweets of the Stability AI, company behind Stable Diffusion, founder. He agrees with you. The image is purely to protect themselves against criticism, takedowns, possible legal action, stuff like that.

https://mobile.twitter.com/emostaque


It's literally one line of code so you don't generate unexpected boobs. Save your ire for the nannies that won't let you generate NSFW images at all.


I’m sure you’re great at parties. A number of people have explained how easy this is to turn off (I did it myself in minutes without outside help by literally commenting out a line of code and changing a variable name). Getting offended at everything doesn’t solve anything and just makes half the US think we’re all snowflakes.


Remember the Terminator movie? It’s about an AI that went too far.


Don’t worry, intentionally warped faces and NSFW filters have like a month, tops


There's no intentionally warped faces on any of these models, SD is just the first model technically advanced enough to be able to generate decent faces. People have very high standards for those.

(What is silly is that DALLE2 won't let you edit a picture with a face in it, so you can't outpaint one of its own generations. But actually you can, if you crop it carefully.)


>StableDiffusion from the original repo will rick roll you if you try to generate something that triggers its NSFW filter.

Just replace in scripts/txt2img.py:

- x_checked_image, has_nsfw_concept = check_safety(x_samples_ddim)

+ x_checked_image = x_samples_ddim

And be done with it.


And you save some more RAM too!


You don't since model is loaded earlier in the script.


You can comment those lines too.


But what is the correct git command to ignore all that?


There's no git command to ignore it, the main repo should merge their changes and then the other person can make a PR on the updated changes.


Where are PRs in Git?


Do a git-cherry-pick.


Almost impossible to pinpoint what changed thanks to thousands of lines of completely irrelevant changes and shitty commit messages. It seems the only changeset that might be relevant out of +2,273 -1,531 is the +11 -7 from https://github.com/basujindal/stable-diffusion/pull/103/comm...? Does it even work?


> Does it even work?

It has an effect, but nothing like what's claimed in the submission title. On a 1070Ti (8GB), I managed to go up from 512576 to 576640.


Has anyone gotten to run on a 4gig card?


As a learning opportunity for people like me, what does a good PR look like for a large change?


I would say large contributions from non-members generally work rather terribly in open source. If you absolutely have to,

- Communicate ahead of time; don't surprise maintainers with sweeping architectural changes or huge features no one wants or would like to review;

- Try to break up changes into logical units that can be understood and reviewed independently;

- Write useful and detailed commit messages (some bad examples: "Update attention.py", "various clean-ups, code now beautified");

- Don't sneak in anything unrelated to the PR; don't sneak anything unrelated into a commit;

- Absolutely don't use a code formatter to format the entire code base if the repo wasn't already using one. You can suggest that separately. And changes like that are best done by a trusted member.


This applies just as much if not more in a closed team setting.

Personally I blame juniors that insist on using the git CLI instead of a million visual GUIs to git. They get told just do a git commit, have no idea what they changed so can't produce a meaningful commit message, and because it's the command line good luck getting a visual representation of what you're committing.


> and because it's the command line good luck getting a visual representation of what you're committing

I don't get the point here. I've never met a developer who knew about `git commit` but didn't know about `git diff` (sometimes I pointed out `git diff --cached`).

I don't think you can blame the CLI on this. It's about the lack of proofreading, in my opinion.


I'm exaggerating a bit of course, but the point still stands. Git diff and git status even with color are practically impossible to read and understand past a couple changes, and if it's across multiple files and code being re-arranged it's hell. And getting a junior to wire up a GUI diff tool to display and show those items? Impossible. Also, getting juniors to commit parts of a file or only a subset of them? Again, impossible.

Maybe you work with a higher/better caliber of juniors than I do. At this point, I'm seriously contemplating being a mean dictator and forbidding commits outside of a dedicated GUI until they can prove their adeptness at using the CLI, which they should learn on their own time.


I had good success with this flow for major changes (both in-house as well as contributing to FOSS). The target was clear & discussed upfront (sans details that pop up during the actual work), and achieved by a series of consecutive PRs. It's important to be constructive, communicate the intent properly and give the other party enough time to digest the info and think properly about it. Obviously this also depends on the maintainers and their mentality.


Follow this: https://cbea.ms/git-commit/

and this: https://www.aleksandrhovhannisyan.com/blog/atomic-git-commit...

That way your "large change" becomes a collection of small, easy to reason about and well explained changes.


in this case a lot of the large change is actually "upstream".

We Have Neon, who is making a PR.

We have Upstream, SD source.

We have basu, repo maintainer.

Neon pulled in upstream changes, and are trying to merge those into basu's repo AND then on top of that, apply neon edits to the code.

Ways to make this cleaner:

- Basu pulls in upstream changes, Neon just puts theirs on top. (Slow, you have to wait for Basu)

- Neon makes two PRs, one to pull in upstream changes, another for his edits on top of those changes. (More work, and coordination, but each PR is "one unit of work")

- Better commit messages: https://github.com/basujindal/stable-diffusion/pull/103#comm... <- notice the repeated and non-descript commit messages? That makes it hard for non-experts to cherry pick out the bits that are really relevant. (Fastest)


You don’t make large change PRs.


Branching in git is very cheap, so the typical path a large change takes to the main branch is by a series of smaller, incremental PRs. This keeps your work close to the main branch as much as possible.


It looks like the first commit is the important change.


In case anyone is confused by the clashing repos, here is how I was able to easily run this updated code.

Clone the original SD repo, which is what this code was built off of, and follow all the installation instructions:

https://github.com/CompVis/stable-diffusion

In that repo, replace the file ldm/modules/attention.py with this file:

https://raw.githubusercontent.com/neonsecret/stable-diffusio...

Now run a new prompt with a larger image. Note that the original model was trained on 512x512 and may lead to repetition especially if you try to increase both dimensions (this is mentioned in the SD readme) so just run with one dimension increased.

For example try the following example:

python scripts/txt2img.py --prompt "a person gardening, by claude monet" --ddim_steps 50 --seed 12000 --scale 9 --n_iter=1 --n_samples=1 --H=512 --W=1024 --skip_grid

I confirmed that if I run that command with the original attention.py, it fails due to lack of memory. With the new attention.py, it succeeds.

That said, this still uses 13GB of ram on my system.

I suppose you can check out the full repo with the updated code, which seems to have other changes, if you want to give that a try.

https://github.com/neonsecret/stable-diffusion/

I have already been using the original SD repo so I found benefit by just changing attention.py


I was under the impression that generating images where both dimensions were larger than 512 didn't merely require a lot of resources but didn't work well as the model was trained exclusively on 512 by 512 images and while it sort of worked ok to stretch one dimension a bit you got weird repetitions by making the overall canvas too large (as this isn't going to generate higher dpi images, merely ones with greater square inches).


This is accurate in my experience. Changing the res. to anything but 512x512 produces inferior results.


The diff that apparently does the RAM optimisation, quite simple and something to learn form:

https://github.com/basujindal/stable-diffusion/commit/47f878...


Yeah, stable diffusion's PyTorch code is not optimized for inference memory usage at start. I am looking at the code now and it seems that if it is converted to static graph, there are probably a bit more opportunities (I only looked at CLIP model and UNet model it uses today, not sure about the Autoencoder yet).


I've been using the HuggingFace diffuses repo[1] with 6GB of VRAM fine.

It's well engineered, maintainable and with decent installation process.

The branch in this PR[2] adds M1 Mac support with a one line patch and it runs faster than the CompVis version (1.5 iterations/sec vs 1.4 for CompVis on a 32 Gb M1 Max,

I highly recommend people switching to that version for the improved flexibility.

[1] https://github.com/huggingface/diffusers

[2] https://github.com/huggingface/diffusers/pull/278


I wouldn’t recommend using that as-is. MPS doesn’t give deterministic random number generation, which means that seeds become meaningless and you won’t ever be able to reproduce something. You can work around it by generating random numbers on the CPU and then moving them to MPS, but that probably requires a fix in PyTorch.

The MPS support issue for diffusers is here:

https://github.com/huggingface/diffusers/issues/292

…and it links to the relevant PyTorch issue here:

https://github.com/pytorch/pytorch/issues/84288


https://github.com/magnusviri/stable-diffusion/commit/d0b168...

Copying this change fixed seeds on M1 for me.


That’s a fix for the CompVis forks, not the diffusers system we are talking about in this thread.

Also, that’s only a partial fix that doesn't really work properly. It doesn’t affect img2img and it still gets things wrong on the first render. Since txt2img starts from scratch each time, that means you’re always getting an incorrect render, it just happens to be the same incorrect render each time.


That's really annoying, even though I hadn't noticed it until now. I can confirm this is an issue.

But the situation seems to be the same on the CompVis derived repos, right?

So this is no worse off, but with better engineered and faster code.


Yeah, but with the CompVis derived repos, it’s pretty easy to go in and change all the calls to PyTorch random number generators.

Having said that, the last comment [0] on the PyTorch issue gave me the idea of monkey patching the random functions. The supplied code assumes you’re always passing in a generator, which is not true in this case, but if you monkey patch the three rand/randn/randn_like functions to do nothing but swap out the device parameter for 'cpu' and then call to('ops') on the return value, it’s enough to get stable seed functionality for the CompVis derived repos without modifying their code, so I’m guessing it will probably work for diffusers as well.

Also, it’s probably a bug in the CompVis code, but even after you fix the random number generator, the very first run in a session uses an incorrect seed. The workaround is to generate an image once to throw away whenever you start a new session.

[0] https://github.com/pytorch/pytorch/issues/84288#issuecomment...


This branch seems to fix these issues: https://github.com/huggingface/diffusers/tree/mps

(Annoyingly I just went and made similar changes and was about to create a PR for them. But they have a fix for a "warm-up" issue I wasn't aware of too)


Yes, the warm-up issue is what I meant by “the very first run in a session uses an incorrect seed”. It looks like they’ve backed away from that now and expect developers to warm up themselves. Not a big deal, you can just use a single step to do that.


Unfortunately the author of this PR decided to include garbage like this [1] in it, so this PR is pretty useless.

[1] https://github.com/basujindal/stable-diffusion/pull/103/file...


Seems like author did his work on a different fork and pushed to this one, which included all changes from the other fork...


How is that garbage? That license is part of the original SD repository, the creator Emad even talks about it in his initial post, about the OpenRAIL M License [0]:

> i) The model is being released under a Creative ML OpenRAIL-M license [https://huggingface.co/spaces/CompVis/stable-diffusion-licen...]. This is a permissive license that allows for commercial and non-commercial usage. This license is focused on ethical and legal use of the model as your responsibility and must accompany any distribution of the model. It must also be made available to end users of the model in any service on it.

[0] https://stability.ai/blog/stable-diffusion-public-release


It’s worth to point out that there’s no consensus yet on whether ML models (as in the weights) are actually copyrightable. If you’re using them, you should probably assume that they are. If you’re distributing them, you should probably assume that they aren’t.


they mean the entire thing has been run through some sort of "beautifier" so there are thousands of irrelevant whitespace changes across things unrelated to the commit, which itself is around 6 lines.


You're still able to run it locally even if the license change prevents it from being merged, and it seems to be a proof of concept that may inspire other people to optimize it in different ways


I just reminded "clean room design" technique: https://en.m.wikipedia.org/wiki/Clean_room_design


Yeah, reminds me of the old freedoom WADs situation


This is the worst type of PR. Just submit a different PR for the code format changes.


I've seen people claim that it's better to generate 512x512 images since that's what it's trained on, and then upscale, rather than generating higher resolution images directly. Anyone tried any systematic investigation of this?


Wild how fast this is moving


Just wait until the AI is writing the code too. You wont be able to `git pull` fast enough.


The power of open source


Slightly off tangent, has anyone got it running on an AMD card? Sad 6600xt Windows user reporting in :(


Not sure about the 6600, but there is a guide for Linux at least:

https://m.youtube.com/watch?v=d_CgaHyA_n4&feature=emb_logo

And this is somehow relevant (possibly), as I kept the link open.

https://github.com/RadeonOpenCompute/ROCm-docker/issues/38


Here is a guide for AMD, but I don't have such a card, so haven't tried.

https://rentry.org/sdamd


Started using this branch and it works on my old ass 1060 gpu. Maybe I'll do a blog post on the whole process


Interesting that “del” helps here - could eg torch.jit.script/trace be more of a “sledgehammer” solution?


I wonder if GTT (GART size) wouldn't help with limitted VRAM amounts, at the cost of some performance?


How much VRAM was previously required?


Looks like 10GB in the README[0] for a 512x512 image - though the fork[1] from the repo in this PR claims to be able to work with as little as 4GB VRAM.

[0] https://github.com/CompVis/stable-diffusion#stable-diffusion [1] https://github.com/basujindal/stable-diffusion#txt2img


I can generate a 768x896 pixels image on an RTX 3090, using 23.4/24GB


How does the content of the images look compared to the same prompt and seed for 512x512? I ask because when I make 768x896 with M1 MAX the resulting image is less coherent, more noisy, and or has repeating subject matter, nothing really usable unlike 512x512.


Same here.


I am able to generate a maximum resolution of 512x768 on my 11GB 1080Ti. This seems to use almost 100% of the available RAM.


That's weird. I cap out at 512x512 on my 16GB Ampere card. Even stepping down precision doesn't help. I wonder what's different.

I use it directly from Python.


You might have the n_samples (aka batch size) set to a number greater than 1? That basically multiplies the amount of VRAM you’re using.

I can generate a 512x512 on my 10gb 3080 no problem (or three 384x384 at a time)


Try the hlky fork: https://github.com/hlky/stable-diffusion

It measures memory usage as well.


Perhaps a different kind of chipset that isn’t optimized yet? Just a guess based on other graphics processes.


512x706 on rtx 2060s with 8gb vram


This is one of the worst PRs i've ever seen.

If I was involved, this PR would be closed and conversation locked, simply because its a ton of formatting changes. The only "memory efficiency" changes are like 6 lines of python where the author uses `del`. Thats it. Everything else is formatting bullshit, and some other merge stuff. Yikes.

Also, some of those PR comments are written by kids or something. Wtf





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: