Stable Diffusion PR optimizes VRAM, generate 576x1280 images with 6 GB VRAM

qdot76367 · on Sept 4, 2022

For everyone about to comment on the garbage in the commit:

It looks like the committer made their changes in the top commit, then merged the updated CompViz StableDiffusion change set on top of it for some reason. That's where the license change, rick astley image, etc come from.

And yes, StableDiffusion from the original repo will rick roll you if you try to generate something that triggers its NSFW filter.

Here's the code that does it:

https://github.com/CompVis/stable-diffusion/blob/main/script...

And here's what it looks like:

https://twitter.com/qDot/status/1565076751465648128

p-e-w · on Sept 4, 2022

> And yes, StableDiffusion from the original repo will rick roll you if you try to generate something that triggers its NSFW filter.

It goes without saying that the authors of a piece of software have the right to make the software do whatever they want, but that shouldn't stop us from recognizing that AI engineers are starting to act like megalomaniac overseers who consider it part of their mission to steer humanity onto the "right" path.

Who exactly do these people think they are?

Imagine this behavior from a web browser. "The URL of the file you were trying to download triggered my NSFW classifier, so I'm going to replace the file with this funny image."

This isn't funny, it's creepy.

qdot76367 · on Sept 4, 2022

Nah it’s just basic due diligence for releasing an open source app of this nature. Turning it off is a simple one line change, because everything is obviously named. This is not some high wall to scale.

As someone that manages nsfw open source projects, this move seems fine to me.

And actually kinda hilarious.

p-e-w · on Sept 4, 2022

If what StableDiffusion did was ask the user something like "The prompt you entered may result in the generation of content some people find objectionable. Are you sure you want to proceed?", then I would buy your argument.

As currently implemented (and the implementation took more work than a confirmation prompt would have!), it's an obvious attempt to control, rather than protect, users. They try their best to dress it up like a joke, but it clearly isn't.

MattRix · on Sept 4, 2022

They aren’t trying to “control” users, they’re trying to minimize the amount of bad press these AIs get. The approach they took was easier than your suggestion, because their system detects NSFW content on a per output image basis… so it’s easier to output some image (rickroll) rather than put up a prompt or even skip it completely (since they also generate a grid of the resulting images).

I should add that it’s also incredibly easy to comment it out.

numpad0 · on Sept 4, 2022

Yeah, it's basic survival. Technology is inherently free and driven by porn, but sometimes that can be shocking to those uninitiated, so to "protect the children", restrictions are often placed to avoid alerting the horde and reverse progresses. I think those "enlightened" should construct and share such understanding that these are the actual reasons, so to avoid us all ending up in an actual dystopia where "protections" are enforced stricter than intended.

astrange · on Sept 4, 2022

The content it can produce is illegal in many countries. It's not something they just decided to do one day on their own.

wolfd · on Sept 4, 2022

The implementation as it stands is actually easier the way they do it. They replace the image by scaling Rick to the image size. That means that they can still automatically put your output images in a grid without any extra code. They could have zeroed the pixels, but that could have led to confusion about if things were working or not.

It's silly and you're able to patch it out if you so desire. But most importantly, they released the model to everyone so it's way more open than OpenAI's DALL-E 2 (for better or worse). I don't think the argument that they're trying to control you really makes a lot of sense given how widely accessible this model is.

IanCal · on Sept 4, 2022

Absolutely. The limit isn't in what you ask for, it's in what's returned. The whole point is to try and stop the user being shown nsfw images despite sfw prompts, and since you can generate multiple results per prompt adding an interactive check on each one doesn't make sense.

IanCal · on Sept 4, 2022

That would absolutely be more work than just subbing in an image and would neither have worked in the original implementation nor the various frontends it's been used in.

This is quite a classic HN kind of comment. Immediately assumes specific problematic intent and proposes a solution that doesn't fit the API.

macrolime · on Sept 4, 2022

Now instead users have to spend hours upon hours editing docker images to remove it. Much easier.

IanCal · on Sept 4, 2022

That's neither true not would it have been helped by making the original code require input during generation.

They put a simple check in, that tries to avoid returning nsfw images so that you don't get that back despite a more 'innocent' prompt. It's trivial to remove and is only part of the demonstration scripts, it is not part of the model or anything fundamental to it's workings.

icelancer · on Sept 4, 2022

It's open-source. Patch it out yourself. It's not hidden.

p-e-w · on Sept 4, 2022

That's not the point at all.

icelancer · on Sept 4, 2022

It's the point of open source.

Kiro · on Sept 4, 2022

You have no point whatsoever. You're fighting windmills.

bawolff · on Sept 4, 2022

I disagree that the rick roll is inappropriate, but i also hate the argument that just because its open source, the programmers are above criticism. I think its reasonable people can dislike or even morally object to the decisions open source maintainers make (even if i dont particularly object to this specific instance)

IanCal · on Sept 4, 2022

They're not above criticism, but what they've done is a very small layer that's easily removed. You can do it, and release a version without it, in minutes. You can do it in less time than it takes to write a long comment decrying it.

It's not fundamental to SD in any way, and their suggestion wouldn't have worked.

Werewolf255 · on Sept 4, 2022

You know you can download the code yourself, right? You need not be reliant on their web interface if you're having these control issues.

It's also clearly in place because they don't have an age disclaimer, which would likely open them up to some sort of liability (or additional liability). Again, it's open source, and freely available, so like. . . this feels like a you problem, sorry.

yardstick · on Sept 4, 2022

As currently implemented, it’s a hilarious Easter egg. I wouldn’t read too much into it.

rdsubhas · on Sept 4, 2022

From your previous comment:

> AI engineers starting to act like megalomaniac overseers who consider it part of their mission to steer humanity onto the "right" path.

You outdid your comment. You have an idea of how this should work. And you're trying to supposedly "steer" in a path of humanity. It's just projection.

This kind of preachery righteousness control tactics should sit elsewhere (fork it). The devs want to convey something is nsfw. They will do it however they want.

SequoiaHope · on Sept 4, 2022

I don’t read it that way. I think their intent is open to interpretation.

manmal · on Sept 4, 2022

Fork it and fix it, see if your branch gains traction.

Angostura · on Sept 4, 2022

What would the average 11 year old answer?

esjeon · on Sept 4, 2022

Yeah, but, when it comes to generating NSFW contents, it is illegal to distribute pornographic software in some countries, and doing so can lead to domain blockage and such (i.e. South Korea, believe or not). This is something you gotta understand.

Also, if the author wanted to block NSFW contents at all, I'm pretty sure one can actually make the filter inseparable from the main network. This isn't the case here AFAIK.

p-e-w · on Sept 4, 2022

That makes no sense. You can draw whatever you like with Photoshop. You can search for whatever you want on Google. That doesn't make the software "NSFW", and no country will block Photoshop's domain just because someone used Photoshop to create NSFW content.

Don't let people who are clearly motivated by a moralistic desire to control others hide behind BS pseudo-legal excuses.

esjeon · on Sept 4, 2022

1. Google do have a stuff called "SafeSearch"[1] to filter explicit contents, and they do have a DMCA-like system for processing violation reports from governments and legal entities. That is, Google is already cooperating.

2. Photoshop's sole function is to let people draw, so drawing a porn w/ Photoshop is 100% users' responsibility. In case of image generators, the AI itself has the capabilities to create NSFW contents. If you're talking about the interactive nature of "prompt", that logic also makes any adult games SFW - they are safe until the user clicks something.

[1]: https://support.google.com/websearch/answer/510

the-rc · on Sept 4, 2022

You'd be right about Photoshop, but...

https://helpx.adobe.com/photoshop/cds.html

manmal · on Sept 4, 2022

Explain this to insert regime when they block your GitHub repo or make using your software a felony. You are proposing to act on ideology, while such mitigations are rooted in pragmatism. You often can’t have both.

numpad0 · on Sept 4, 2022

SK and Germany are notorious for that. They don't care how the pixel arrays came to existence, any shapes and forms and representations, even links to such information, that are interpretable as belonging to certain categories are plain illegal there. I believe SK don't even recognize copyright laws for such "illegal" information.

greesil · on Sept 4, 2022

Yes, the megalomania of someone who makes their code and models freely available.

p-e-w · on Sept 4, 2022

StableDiffusion isn't freely available, in the "Free Software" sense. They use the highly uncommon "CreativeML Open RAIL-M License" which is a wall of text composed of weasel words describing how the software is so incredibly advanced and dangerous that despite the authors' earnest wish to do so, they cannot in good conscience make it genuinely Free Software.

These people wrote a bunch of Python code that pipes images and text through a GPU, and they're acting as if they had created a secret weapon that somehow humanity must be protected from. If that's not megalomania, I don't know what is.

IanCal · on Sept 4, 2022

If they think it's a secret weapon humanity must be prevented from using, giving it away for free with examples on how to use it with a note on the side saying "pls don't misuse thanks" seems like a very odd thing to do.

MattRix · on Sept 4, 2022

Oh please, all the creators of these image AIs (OpenAI, Google, Midjourney, SD, etc) are being very very cautious with this stuff. It’s not going to end humanity, but it could easily lead to some really gross content that would paint the responsible organization in a bad light.

kaoD · on Sept 4, 2022

Can you elaborate? Would Adobe be painted in a bad light if someone committed a crime using Photoshop?

Honestly it just feels weird. I've read the license restrictions and I can't see why they are there out what they're preventing.

tirpen · on Sept 4, 2022

Photoshop doesn't have an automatic "Generate image of Obama selling weapons to Putin" function.

kaoD · on Sept 4, 2022

But anyone with enough time to learn (or money) can do it anyways without much effort.

(I'm trying to understand the exact difference here.) So it boils down to democratization? That anyone can do it regardless of skills acquired?

pronlover723 · on Sept 4, 2022

It boils down to who's doing it. An artist in photoshop the artist is responsible. An AI asked to do it likely the people that made the AI is responsible (until AI can think for itself).

Take it out of drawing. If you write a program to control elevators and it breaks the elevators aren't you, the person that wrote the program, responsible? Why would Adobe be responsible for something someone else draws?

Or let's take an easier more close example. If you just made a character generator. Here's one

https://www.numuki.com/game/mii-creator/

And there was a "randomize" button that one out of 100% made a very pornographic image. Who would get the blame? The person that pushed the button or the person that created the project?

Vetch · on Sept 4, 2022

Perhaps it's the use of the AI moniker but this thing is a computer program under the control of a human, who is the person responsible. It takes time and effort to use and is very much not like automated elevator control. In order of difficulty:

1) figuring out how to rejig the prompt to get what you'd like, adjusting seeds and tuning configuration options. The more control you want, the more complex and manual the pipeline of backing software will be. All it does is amplify everyone's innate artistic talent.

2) Coming up with a good prompt. This relies on the person's imagination, facility with words and familiarity of limitations of the image generating software and its training datasets.

3) Selection. This can be a frustrating experience that tries one's patience.

> made a very pornographic image. Who would get the blame?

You would, it's not like the software automatically distributes all its generations. The vast majority of images these software generate are not good enough to be shared and aren't. You made the conscious decision to share it.

Even if it were an AGI, you would be responsible. It's very much possible to commission a ninja from a human artist and get something very pornographic on a famous celebrity and you would be held responsible for choosing to share it, you had the choice not to.

p-e-w · on Sept 4, 2022

True, they are all very cautious not to let bad actors generate bad content.

Where "bad actors" are defined as "people who disagree with us", and "bad content" is defined as "things we don't want to see".

Needless to say, the list of bad actors never includes the authors themselves, and the list of unacceptable applications never includes anything the authors had in mind.

MattRix · on Sept 4, 2022

They are not cautious at all to prevent bad actors. It is very very easy to bypass their filters. They are just doing the basic due diligence to make sure that the average casual user doesn’t make something that grosses themself out.

greesil · on Sept 4, 2022

You may be right in the larger sense, but the word you're looking for is conceited, maybe.

Megalomania: obsession with the exercise of power, especially in the domination of others

0cVlTeIATBs · on Sept 4, 2022

Devil's advocate, this output could have unexpected results, making a filter possibly desirable. Browsers do filter by default with safe browsing URL lists for malware, and they gate access to sites with bad certificates. But agreed users should have control, leaving such filters as safe defaults only.

p-e-w · on Sept 4, 2022

> Browsers do filter by default with safe browsing URL lists for malware

Yes, for malware. As in, software created by criminals to damage your computer and/or steal your personal information.

That's not even remotely comparable to what StableDiffusion does here. What they do is refuse to generate content you requested, based on opaque criteria that no doubt are ultimately influenced by quasi-religious sentiments retained from the Bronze Age.

code_duck · on Sept 4, 2022

The concept is quite similar to the NSFW filter found on Google and other image searches for many years. Like the “mature” search settings, this filter is active by default but simple to disable.

Angostura · on Sept 4, 2022

> Who exactly do these people think they are?

Software engineers who took the time to think about the ways that their work could be used.

How many times have we seen on HN people pleading with developers to think about the ethical dimensions of their work?

Arainach · on Sept 4, 2022

What's creepy is the very real world that we're approaching where anyone can create porn of anyone else given a few pictures without their consent. That can cause all sorts of lasting psychological harm and other societal negative consequences. Even if this probably can't be entirely avoided, that doesn't mean that the folks developing these models are obligated to make it easy or are incorrect to take steps to avoid it.

hda2 · on Sept 4, 2022

> What's creepy is the very real world that we're approaching where anyone can create porn of anyone else given a few pictures without their consent.

Why is that a problem, exactly?

> That can cause all sorts of lasting psychological harm and other societal negative consequences.

Like what? To me this sounds like a problem with how some people want to have absolute control over what others do and want everyone else to play along. That's what I find really creepy.

Someone wants to generate porn of me fucking a donkey? Let them. They already could do that with ms-paint and I don't get magically harmed if they do.

This reminds me of the old saying: "Sticks and stones may break my bones, but AI-generated/photoshopped/glued-together pictures of me fucking a donkey will never hurt me" :)

Arainach · on Sept 4, 2022

Revenge porn, rumors, and defamation are already huge problems in society. Being able to generate such things on demand with AI models will make it significantly worse.

https://www.theguardian.com/lifeandstyle/2018/may/07/many-re...

Teever · on Sept 4, 2022

To play devil's advocate being able to generate this kind of content on demand with AI may actually make things significantly worse.

The reason why revenge porn, rumours, and defamation are all possible online is because there's a level of plausibility to these things -- especially revenge porn.

Now that plausibility goes out the window because of AI generated content. Someone posts your actual nudes? Give a nervous laugh and say that AI made them.

Someone says they heard a rumour about you and asks if it's true? Say 'don't believe everything you read on the internet, you know they have AI that writes anything you want now, right?"

bawolff · on Sept 4, 2022

Perhaps, but there comes a point where the technology is out there whether we like it or not. We should criminalize bad uses, not engage in some futile attempt to try to stuff the proverbial cat back in the bag

Arainach · on Sept 4, 2022

Don't let perfection be the enemy of good. Anyone with a basic machine shop can modify an AR15 for fully automatic fire, and a bump stock can be fabricated even easier. That doesn't mean we should just give up and make these things legal and easily accessible to everyone.

Vetch · on Sept 4, 2022

An automatic rifle has almost no positive use to it. This software is not like that and more like something that makes everyone better at a skill, uniformly raising the artistic talent floor. The large majority benefit but some would use their new earned skill for bad. It would be like denying literacy and the printing press because there will always be some people who use it to write very bad distressing things, manipulate more easily en masse and spread misinformation.

Arainach · on Sept 5, 2022

They're not refusing to distribute the software, they're restricting it for NSFW uses, of which the danger is more significant than any positive uses - much like automatic fire.

hda2 · on Sept 5, 2022

"restricting it for NSFW", according to whose standard? "which the danger", like what?

I'm glad this technology is finally out in the open where people like you no longer have a say in how it is used. The sooner people accept its existence (like they have with Photoshop), the better off and healthier we'll be as a society.

ziddoap · on Sept 4, 2022

>Why is that a problem, exactly?

Do you really need someone to spell out to you why being able to create realistic-looking porn of anyone might lead to some issues...?

>Someone wants to generate porn of me fucking a donkey? Let them. They already could do that with ms-paint and I don't get magically harmed if they do.

What an absolutely wild take. Do you expect this line of thinking to be convincing when you're comparing AI-generated images to what someone can whip together in MS Paint? Come on. You, me, and everyone else reading here knows that's not even close to a valid comparison.

hda2 · on Sept 5, 2022

> Do you really need someone to spell out to you why being able to create realistic-looking porn of anyone might lead to some issues...?

Yes. I honestly have no

> What an absolutely wild take. Do you expect this line of thinking to be convincing when you're comparing AI-generated images to what someone can whip together in MS Paint? Come on. You, me, and everyone else reading here knows that's not even close to a valid comparison.

Wild or not, my point stands. With minimal skills, you can photoshop anyone's face on any pornographic image out there. It's a spot on comparison.

ipaddr · on Sept 4, 2022

Do you really need someone to spell out to you why being able to create realistic-looking porn of anyone might lead to some issues...?

A less creative next generation as they don't have to close there eyes and imagine generated porn of anyone.

The second problem will be finding people that you would even want generated porn for.

NLPlatypus · on Sept 4, 2022

That proverb isn’t true and stems from a generation that didn’t understand mental health

Georgelemental · on Sept 4, 2022

The proverb describes a state of mind one should aspire to. "Words will never hurt me" not because saying mean things can never do harm (it obviously can), but because I make an effort to control my emotions and reactions. Just as physical health is improved by a nutritious diet and regular exercise, mental health is improved by proper habits of mind.

bawolff · on Sept 4, 2022

Refusing to acknowledge the affect of hurtful words on yourself, is pretty much the opposite of "proper habits of the mind".

Denial does not make healthy people.

mobiclick · on Sept 4, 2022

It's not denial so much as it is not caring to the point where you get upset. I think that's the strength of mind that GP is referring to.

wholinator2 · on Sept 4, 2022

I agree. Just because we have more advanced diagnosis and detection tools these days doesn't mean everything has to be a problem. The internet has really shaken up social norms and signaling and the kids are the first ones to enter the new world. I believe once we have a generation of fully formed 30-50 year olds who grew up with this strange social media, then we'll have the proper understandings and social knowledge to handle it well. Kinda like how parents these days can attempt to relate to their kids in high school. We need the elder wisdom there.

But I do believe that right now we're kinda off track. We almost venerate the act of being hurt. Everyone likes attention and nothing gets such protection by certain classes as having been offended or wronged by some other class. Social signals are currently built to display virtue and so people will go out of their way to display their support of the wronged. I _do_ believe that this is the correct direction to move from where we were, but I think it's gone a little too far and needs time to rebound.

Being a victim is the fastest way to go from zero to hero (reach millions of people) these days and it's also seemingly the least likely way to backfire. People are much more hesitant to bring up the wrongs of someone who's currently being defended for fear of ending up being placed in the out group and ostracized from the signaling group.

wholinator2 · on Sept 4, 2022

This all contributes to a social custom of looking to be wronged, so that you may point out the evil of the person/generation/world. However, people aren't idiots and can _sometimes_ tell if you're just faking for clout. This leads to a feedback loop of needing to be really hurt, seeing other people be really hurt, believing your really hurt, and finally internalizing that pain and trauma. Things do hurt, bad words are called bad for a reason. Society might be better if we were all nice. But every bad word and every microaggression does not need to become such a large roadblock to personal freedom. People are chaining themselves to the road with this stuff. Dieing on hills that require them to have been personally wronged, using their own pain as a way to shut down criticism. Yes, people hurt, and things can be bad. But it's also entirely possible to see something hurtful and continue life without it hurting you. It's 100% doable to actually not be hurt, not just ignore it, but to construct a self esteem and understanding that allows you to not be shanked by every half difficult social interaction that occurs.

Mental health awareness is good. But social signals lead individuals to believing they must be hurt to be a part of the in group. Virtue through suffering is an incredibly effective signaling mechanism

bawolff · on Sept 4, 2022

Im not saying you should broadcast your woes to the world for fake internet points. Just that people should be ok with having a bad day sometimes and that people who think they are too "strong" to ever be sad, usually are the most screwed up mentally.

bawolff · on Sept 4, 2022

I would suggest that anyone who cares so little that there is literally nobody who can say anything that hurts them is basically dead inside.

Part of living is opening yourself up to people. That includes the risk of getting hurt. That's normal and part of the human experience.

hda2 · on Sept 5, 2022

I don't think anyone in this thread is denying this.

ipaddr · on Sept 4, 2022

It is almost like a human able to imagine. If you think human mind is creepy because it can create any thought and start worrying about some psychological harm then you have created a false prison within your own mind where scary thoughts will creep in.

Scaevolus · on Sept 4, 2022

Deepfakes have already existed for several years, and let you paste any face onto any video, in a much more convincing manner than Stable Diffusion, which is very limited in video2video domains.

faeriechangling · on Sept 4, 2022

Photoshop doesn’t let you work on an image of a dollar bill.

I would argue the authors went out of their way to actually make it very easy to “decensor” their model in a way that lets them wash their hands of things. So regardless of the outrage over their Puritanism, they actually have made it easier to make pornography than ever before. I don’t think applying the tiny speedbump of their model being censored by default is unreasonable in the interests of conservativism.

In the long run these men will go down in the history of pornography.

vimy · on Sept 4, 2022

Read the tweets of the Stability AI, company behind Stable Diffusion, founder. He agrees with you. The image is purely to protect themselves against criticism, takedowns, possible legal action, stuff like that.

https://mobile.twitter.com/emostaque

ThrowawayTestr · on Sept 4, 2022

It's literally one line of code so you don't generate unexpected boobs. Save your ire for the nannies that won't let you generate NSFW images at all.

throwaway5959 · on Sept 4, 2022

I’m sure you’re great at parties. A number of people have explained how easy this is to turn off (I did it myself in minutes without outside help by literally commenting out a line of code and changing a variable name). Getting offended at everything doesn’t solve anything and just makes half the US think we’re all snowflakes.

jannes · on Sept 4, 2022

Remember the Terminator movie? It’s about an AI that went too far.

yieldcrv · on Sept 4, 2022

Don’t worry, intentionally warped faces and NSFW filters have like a month, tops

astrange · on Sept 4, 2022

There's no intentionally warped faces on any of these models, SD is just the first model technically advanced enough to be able to generate decent faces. People have very high standards for those.

(What is silly is that DALLE2 won't let you edit a picture with a face in it, so you can't outpaint one of its own generations. But actually you can, if you crop it carefully.)

aortega · on Sept 4, 2022

>StableDiffusion from the original repo will rick roll you if you try to generate something that triggers its NSFW filter.

Just replace in scripts/txt2img.py:

- x_checked_image, has_nsfw_concept = check_safety(x_samples_ddim)

+ x_checked_image = x_samples_ddim

And be done with it.

rrobukef · on Sept 4, 2022

And you save some more RAM too!

terafo · on Sept 4, 2022

You don't since model is loaded earlier in the script.

aortega · on Sept 4, 2022

You can comment those lines too.

NHQ · on Sept 4, 2022

But what is the correct git command to ignore all that?

cercatrova · on Sept 4, 2022

There's no git command to ignore it, the main repo should merge their changes and then the other person can make a PR on the updated changes.

NHQ · on Sept 4, 2022

Where are PRs in Git?

oefrha · on Sept 4, 2022

Do a git-cherry-pick.

oefrha · on Sept 4, 2022

Almost impossible to pinpoint what changed thanks to thousands of lines of completely irrelevant changes and shitty commit messages. It seems the only changeset that might be relevant out of +2,273 -1,531 is the +11 -7 from https://github.com/basujindal/stable-diffusion/pull/103/comm...? Does it even work?

jsnell · on Sept 4, 2022

> Does it even work?

It has an effect, but nothing like what's claimed in the submission title. On a 1070Ti (8GB), I managed to go up from 512576 to 576640.

ipaddr · on Sept 4, 2022

Has anyone gotten to run on a 4gig card?

piyh · on Sept 4, 2022

As a learning opportunity for people like me, what does a good PR look like for a large change?

oefrha · on Sept 4, 2022

I would say large contributions from non-members generally work rather terribly in open source. If you absolutely have to,

- Communicate ahead of time; don't surprise maintainers with sweeping architectural changes or huge features no one wants or would like to review;

- Try to break up changes into logical units that can be understood and reviewed independently;

- Write useful and detailed commit messages (some bad examples: "Update attention.py", "various clean-ups, code now beautified");

- Don't sneak in anything unrelated to the PR; don't sneak anything unrelated into a commit;

- Absolutely don't use a code formatter to format the entire code base if the repo wasn't already using one. You can suggest that separately. And changes like that are best done by a trusted member.

zo1 · on Sept 4, 2022

This applies just as much if not more in a closed team setting.

Personally I blame juniors that insist on using the git CLI instead of a million visual GUIs to git. They get told just do a git commit, have no idea what they changed so can't produce a meaningful commit message, and because it's the command line good luck getting a visual representation of what you're committing.

silviot · on Sept 4, 2022

> and because it's the command line good luck getting a visual representation of what you're committing

I don't get the point here. I've never met a developer who knew about `git commit` but didn't know about `git diff` (sometimes I pointed out `git diff --cached`).

I don't think you can blame the CLI on this. It's about the lack of proofreading, in my opinion.

zo1 · on Sept 4, 2022

I'm exaggerating a bit of course, but the point still stands. Git diff and git status even with color are practically impossible to read and understand past a couple changes, and if it's across multiple files and code being re-arranged it's hell. And getting a junior to wire up a GUI diff tool to display and show those items? Impossible. Also, getting juniors to commit parts of a file or only a subset of them? Again, impossible.

Maybe you work with a higher/better caliber of juniors than I do. At this point, I'm seriously contemplating being a mean dictator and forbidding commits outside of a dedicated GUI until they can prove their adeptness at using the CLI, which they should learn on their own time.

archi42 · on Sept 4, 2022

I had good success with this flow for major changes (both in-house as well as contributing to FOSS). The target was clear & discussed upfront (sans details that pop up during the actual work), and achieved by a series of consecutive PRs. It's important to be constructive, communicate the intent properly and give the other party enough time to digest the info and think properly about it. Obviously this also depends on the maintainers and their mentality.

imron · on Sept 4, 2022

Follow this: https://cbea.ms/git-commit/

and this: https://www.aleksandrhovhannisyan.com/blog/atomic-git-commit...

That way your "large change" becomes a collection of small, easy to reason about and well explained changes.

lbotos · on Sept 4, 2022

in this case a lot of the large change is actually "upstream".

We Have Neon, who is making a PR.

We have Upstream, SD source.

We have basu, repo maintainer.

Neon pulled in upstream changes, and are trying to merge those into basu's repo AND then on top of that, apply neon edits to the code.

Ways to make this cleaner:

- Basu pulls in upstream changes, Neon just puts theirs on top. (Slow, you have to wait for Basu)

- Neon makes two PRs, one to pull in upstream changes, another for his edits on top of those changes. (More work, and coordination, but each PR is "one unit of work")

- Better commit messages: https://github.com/basujindal/stable-diffusion/pull/103#comm... <- notice the repeated and non-descript commit messages? That makes it hard for non-experts to cherry pick out the bits that are really relevant. (Fastest)

adastra22 · on Sept 4, 2022

You don’t make large change PRs.

MontagFTB · on Sept 4, 2022

Branching in git is very cheap, so the typical path a large change takes to the main branch is by a series of smaller, incremental PRs. This keeps your work close to the main branch as much as possible.

operator-name · on Sept 4, 2022

It looks like the first commit is the important change.

SequoiaHope · on Sept 4, 2022

In case anyone is confused by the clashing repos, here is how I was able to easily run this updated code.

Clone the original SD repo, which is what this code was built off of, and follow all the installation instructions:

https://github.com/CompVis/stable-diffusion

In that repo, replace the file ldm/modules/attention.py with this file:

https://raw.githubusercontent.com/neonsecret/stable-diffusio...

Now run a new prompt with a larger image. Note that the original model was trained on 512x512 and may lead to repetition especially if you try to increase both dimensions (this is mentioned in the SD readme) so just run with one dimension increased.

For example try the following example:

python scripts/txt2img.py --prompt "a person gardening, by claude monet" --ddim_steps 50 --seed 12000 --scale 9 --n_iter=1 --n_samples=1 --H=512 --W=1024 --skip_grid

I confirmed that if I run that command with the original attention.py, it fails due to lack of memory. With the new attention.py, it succeeds.

That said, this still uses 13GB of ram on my system.

I suppose you can check out the full repo with the updated code, which seems to have other changes, if you want to give that a try.

https://github.com/neonsecret/stable-diffusion/

I have already been using the original SD repo so I found benefit by just changing attention.py

saurik · on Sept 4, 2022

I was under the impression that generating images where both dimensions were larger than 512 didn't merely require a lot of resources but didn't work well as the model was trained exclusively on 512 by 512 images and while it sort of worked ok to stretch one dimension a bit you got weird repetitions by making the overall canvas too large (as this isn't going to generate higher dpi images, merely ones with greater square inches).

prettydeep · on Sept 5, 2022

This is accurate in my experience. Changing the res. to anything but 512x512 produces inferior results.

WithinReason · on Sept 4, 2022

The diff that apparently does the RAM optimisation, quite simple and something to learn form:

https://github.com/basujindal/stable-diffusion/commit/47f878...

liuliu · on Sept 4, 2022

Yeah, stable diffusion's PyTorch code is not optimized for inference memory usage at start. I am looking at the code now and it seems that if it is converted to static graph, there are probably a bit more opportunities (I only looked at CLIP model and UNet model it uses today, not sure about the Autoencoder yet).

nl · on Sept 4, 2022

I've been using the HuggingFace diffuses repo[1] with 6GB of VRAM fine.

It's well engineered, maintainable and with decent installation process.

The branch in this PR[2] adds M1 Mac support with a one line patch and it runs faster than the CompVis version (1.5 iterations/sec vs 1.4 for CompVis on a 32 Gb M1 Max,

I highly recommend people switching to that version for the improved flexibility.

[1] https://github.com/huggingface/diffusers

[2] https://github.com/huggingface/diffusers/pull/278

JimDabell · on Sept 4, 2022

I wouldn’t recommend using that as-is. MPS doesn’t give deterministic random number generation, which means that seeds become meaningless and you won’t ever be able to reproduce something. You can work around it by generating random numbers on the CPU and then moving them to MPS, but that probably requires a fix in PyTorch.

The MPS support issue for diffusers is here:

https://github.com/huggingface/diffusers/issues/292

…and it links to the relevant PyTorch issue here:

https://github.com/pytorch/pytorch/issues/84288

whywhywhywhy · on Sept 4, 2022

https://github.com/magnusviri/stable-diffusion/commit/d0b168...

Copying this change fixed seeds on M1 for me.

JimDabell · on Sept 10, 2022

That’s a fix for the CompVis forks, not the diffusers system we are talking about in this thread.

Also, that’s only a partial fix that doesn't really work properly. It doesn’t affect img2img and it still gets things wrong on the first render. Since txt2img starts from scratch each time, that means you’re always getting an incorrect render, it just happens to be the same incorrect render each time.

nl · on Sept 4, 2022

That's really annoying, even though I hadn't noticed it until now. I can confirm this is an issue.

But the situation seems to be the same on the CompVis derived repos, right?

So this is no worse off, but with better engineered and faster code.

JimDabell · on Sept 4, 2022

Yeah, but with the CompVis derived repos, it’s pretty easy to go in and change all the calls to PyTorch random number generators.

Having said that, the last comment [0] on the PyTorch issue gave me the idea of monkey patching the random functions. The supplied code assumes you’re always passing in a generator, which is not true in this case, but if you monkey patch the three rand/randn/randn_like functions to do nothing but swap out the device parameter for 'cpu' and then call to('ops') on the return value, it’s enough to get stable seed functionality for the CompVis derived repos without modifying their code, so I’m guessing it will probably work for diffusers as well.

Also, it’s probably a bug in the CompVis code, but even after you fix the random number generator, the very first run in a session uses an incorrect seed. The workaround is to generate an image once to throw away whenever you start a new session.

[0] https://github.com/pytorch/pytorch/issues/84288#issuecomment...

nl · on Sept 5, 2022

This branch seems to fix these issues: https://github.com/huggingface/diffusers/tree/mps

(Annoyingly I just went and made similar changes and was about to create a PR for them. But they have a fix for a "warm-up" issue I wasn't aware of too)

JimDabell · on Sept 10, 2022

Yes, the warm-up issue is what I meant by “the very first run in a session uses an incorrect seed”. It looks like they’ve backed away from that now and expect developers to warm up themselves. Not a big deal, you can just use a single step to do that.

ceeplusplus · on Sept 4, 2022

Unfortunately the author of this PR decided to include garbage like this [1] in it, so this PR is pretty useless.

[1] https://github.com/basujindal/stable-diffusion/pull/103/file...

Matheus28 · on Sept 4, 2022

Seems like author did his work on a different fork and pushed to this one, which included all changes from the other fork...

cercatrova · on Sept 4, 2022

How is that garbage? That license is part of the original SD repository, the creator Emad even talks about it in his initial post, about the OpenRAIL M License [0]:

> i) The model is being released under a Creative ML OpenRAIL-M license [https://huggingface.co/spaces/CompVis/stable-diffusion-licen...]. This is a permissive license that allows for commercial and non-commercial usage. This license is focused on ethical and legal use of the model as your responsibility and must accompany any distribution of the model. It must also be made available to end users of the model in any service on it.

[0] https://stability.ai/blog/stable-diffusion-public-release

zarzavat · on Sept 4, 2022

It’s worth to point out that there’s no consensus yet on whether ML models (as in the weights) are actually copyrightable. If you’re using them, you should probably assume that they are. If you’re distributing them, you should probably assume that they aren’t.

whywhywhywhy · on Sept 4, 2022

they mean the entire thing has been run through some sort of "beautifier" so there are thousands of irrelevant whitespace changes across things unrelated to the commit, which itself is around 6 lines.

kg · on Sept 4, 2022

You're still able to run it locally even if the license change prevents it from being merged, and it seems to be a proof of concept that may inspire other people to optimize it in different ways

jinseokim · on Sept 4, 2022

I just reminded "clean room design" technique: https://en.m.wikipedia.org/wiki/Clean_room_design

javchz · on Sept 4, 2022

Yeah, reminds me of the old freedoom WADs situation

sercand · on Sept 4, 2022

This is the worst type of PR. Just submit a different PR for the code format changes.

macrolime · on Sept 4, 2022

I've seen people claim that it's better to generate 512x512 images since that's what it's trained on, and then upscale, rather than generating higher resolution images directly. Anyone tried any systematic investigation of this?

andrethegiant · on Sept 4, 2022

Wild how fast this is moving

bongobingo1 · on Sept 4, 2022

Just wait until the AI is writing the code too. You wont be able to `git pull` fast enough.

jpeter · on Sept 4, 2022

The power of open source

switchers · on Sept 4, 2022

Slightly off tangent, has anyone got it running on an AMD card? Sad 6600xt Windows user reporting in :(

universa1 · on Sept 4, 2022

Not sure about the 6600, but there is a guide for Linux at least:

https://m.youtube.com/watch?v=d_CgaHyA_n4&feature=emb_logo

And this is somehow relevant (possibly), as I kept the link open.

https://github.com/RadeonOpenCompute/ROCm-docker/issues/38

mpaepper · on Sept 4, 2022

Here is a guide for AMD, but I don't have such a card, so haven't tried.

https://rentry.org/sdamd

LarsDu88 · on Sept 4, 2022

Started using this branch and it works on my old ass 1060 gpu. Maybe I'll do a blog post on the whole process

jacobn · on Sept 4, 2022

Interesting that “del” helps here - could eg torch.jit.script/trace be more of a “sledgehammer” solution?

MayeulC · on Sept 4, 2022

I wonder if GTT (GART size) wouldn't help with limitted VRAM amounts, at the cost of some performance?

dr_dshiv · on Sept 4, 2022

How much VRAM was previously required?

jameshe · on Sept 4, 2022

Looks like 10GB in the README[0] for a 512x512 image - though the fork[1] from the repo in this PR claims to be able to work with as little as 4GB VRAM.

[0] https://github.com/CompVis/stable-diffusion#stable-diffusion [1] https://github.com/basujindal/stable-diffusion#txt2img

gpt5 · on Sept 4, 2022

I can generate a 768x896 pixels image on an RTX 3090, using 23.4/24GB

moneycantbuy · on Sept 4, 2022

How does the content of the images look compared to the same prompt and seed for 512x512? I ask because when I make 768x896 with M1 MAX the resulting image is less coherent, more noisy, and or has repeating subject matter, nothing really usable unlike 512x512.

gpt5 · on Sept 4, 2022

Same here.

temp_account_32 · on Sept 4, 2022

I am able to generate a maximum resolution of 512x768 on my 11GB 1080Ti. This seems to use almost 100% of the available RAM.

frognumber · on Sept 4, 2022

That's weird. I cap out at 512x512 on my 16GB Ampere card. Even stepping down precision doesn't help. I wonder what's different.

I use it directly from Python.

MattRix · on Sept 4, 2022

You might have the n_samples (aka batch size) set to a number greater than 1? That basically multiplies the amount of VRAM you’re using.

I can generate a 512x512 on my 10gb 3080 no problem (or three 384x384 at a time)

temp_account_32 · on Sept 4, 2022

Try the hlky fork: https://github.com/hlky/stable-diffusion

It measures memory usage as well.

prox · on Sept 4, 2022

Perhaps a different kind of chipset that isn’t optimized yet? Just a guess based on other graphics processes.

jpeter · on Sept 4, 2022

512x706 on rtx 2060s with 8gb vram

lionkor · on Sept 4, 2022

This is one of the worst PRs i've ever seen.

If I was involved, this PR would be closed and conversation locked, simply because its a ton of formatting changes. The only "memory efficiency" changes are like 6 lines of python where the author uses `del`. Thats it. Everything else is formatting bullshit, and some other merge stuff. Yikes.

Also, some of those PR comments are written by kids or something. Wtf

avocado2 · on Sept 4, 2022

can run with gui: https://github.com/basujindal/stable-diffusion/blob/main/opt...