Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
X's new AI image generator will make anything (theverge.com)
65 points by belter on Aug 14, 2024 | hide | past | favorite | 88 comments


I am wondering what will come to save us from this AI storm. It looks like online search and social networks will be more or less dead. Maybe we will start going out again.


The only thing that will stop AI is AI proving less profitable than the alternative.

I see stories that the market is already starting to doubt the hype. Integrating AI into existing workflows or replacing those workflows is often more complex and error-prone than simply having human beings do the thing, and the cost benefit analysis isn't there because the technology just doesn't live up to expectations. So there may be hope.


"Maybe we will start going out again."

That is the eventual outcome. The question is how much havoc so-called "tech" companies can wreak before we get there. They have lots of money to burn. This could take a while. The environmental costs are enormous.

We are now at the "They have copied all our output, chopped it into tokens and are regurgitating it back to us" stage.

The internet is more than the web. Perhaps the web must be sacrificed to free ourselves from these cretinous intermediaries.


Just keep querying all free AI services as much as they allow – the energy costs will sort things out.


Lawsuits.


Copyright lawsuits will save us from political discomfort? Is this bizzarro HN?


Defamation law


is going to stop random number generators from creating synthetic reality on local devices?


No but it will stop a free for all.


Most of this conversation is just people who don’t know what image generation is doing today. X’s shitty wrapper around Flux is meaningless.

Go to civit ai and grab the latest open source fine tune of your choice. You want porn? Great. We have “Boobs and More”, “face sitting”, “flux topless” and “improved female nudity”. These are looking at just the models updated in the past 24h and just for flux.

Having a big tech giant act as a middleman for you is just weird. Who gives a shit about their particular guardrails?


This article didn't mention they were using flux under the hood. Do you have a link to that? It makes sense to do that.


I guessed that it was based on the Microsoft logo. Others commenting have found a press release confirming this


Verge states it in their other article. https://www.theverge.com/2024/8/14/24220127/grok-ai-chatbot-...

> Grok’s prompt-based image maker is powered by Black Forest Lab’s Flux 1 AI model, and allows users to generate and publish images directly to the X social platform — with seemingly few guardrails in place to prevent abuse.


Then what is xAI doing? Any published papers or anything?


For images, nothing. They’re just using an off the shelf model and using their LLM to prompt it.


Anything except for anything vaguely related to sex, because you can’t have that. Freedom of speech maximalism, right there.


I’ve had to stop watching most tv shows lately because of the extreme level of violence. And the less I watch the more sensitized I get. The double standard indicates a very sick society.


I noted much the same double standard at the turn of the millennium, back when I was at school and being encouraged to write letters to newspapers.

This was the UK, so age of consent was 16, meaning I was allowed to perform acts that I wasn't allowed to watch.


I recall Georges Orwell making the same kind of remarks about novels in one of his essays, I think a little before WWII. I seem to recall that he was speaking of the depiction of physical violence in pulps.

I... kinda think he was right.


> I... kinda think he was right.

He often was, to be honest.


Not impressed. That guy looks literally nothing like Bill Gates, and his cocaine-snorting is physically and physiologically very implausible.


I was impressed by his apparent ability to levitate coke through the sheer force of his addiction.


He's exhaling.


How would this make the user’s experience in the website better?


Currently the website is filled with many bots trying to promote and propagandise particular views and force them on you if you want them or not, but now your entire feed can be generated on the fly specifically based on what you engage with the most without any of the distraction that comes from "being sociable" — a concept dangerously close to "socialist" that we can't allow in our social media.

Or something like that.

Last I checked they've still not auto-deleted my account from lack of use, but it's been years since I logged in.


How do they force their views on you against your will? Can't you ban them or not use the site?


> How do they force their views on you against your will?

My (perhaps unwise) hyperbole for effect aside:

A spam filter is not found between my eyeball and my language processing ability, so I can only find out a message is propaganda as a result of reading it.

> Can't you ban them

I can't ban them because I don't form part of Twitter's moderation team. I could block any specific account, but even that's after the fact of having been exposed.

> or not use the site?

I thought this was clear from my final 7 words.


> A spam filter is not found between my eyeball and my language processing ability, so I can only find out a message is propaganda as a result of reading it.

Ah, but that's... literally everywhere. It sounded like your issue was specific to X, but you can experience the same thing reading a magazine, riding the underground or simply walking around in any modern city.


Sure, but there's a lot more of them that happen to be on twitter even when it was still called that. Accounts are easy to make, tweeting is almost too cheap to measure.

The IRL stuff doesn't scale as easily, and the responses from others take them down rather than boosting them — the "Nicht unser Krieg" graffiti has a constant back-and-forth counter-graffiti that covers up "nicht", for example.


This is about Elon Musk recovering the money he spent on Twitter, by invoicing the "AI" work to Twitter from xAI....


And just as a middleman to Black Forest Labs doing the actual model here. "What would you say you [x.ai] do here?"


"What would you say you [x.ai] do here?"

Run the hardware?


Carries the queries to the hardware? I'm a people person, dammit.


Not many models can produce the Microsoft logo like that.

Wonder if it’s just Flux


release blog post confirmed it's just flux https://x.ai/blog/grok-2


Hah! Sniped.

Yeah flux is a great model. It’s very inconsistent about its knowledge about what celebrities look like. Some it knows. Some it does not. Mostly women.


NOO you can't just do what you want on your computing device!


But it's not on your computing device?


You could say it's not my software even when it runs locally, or my model weights. I'm just licensing them from the owner.

Doesn't make sense to me, in my free-as-in-freedom computing worldview.


Huh? This isn’t free as in freedom. This is a server they control and can alter at any point.

You can use alternative services to manage your own servers if you want for “free-as-in-freedom”.


I believe that GP was speaking of Stable Diffusion-style architectures + models, which you can run on your own hardware.


Do you think those who immediately denounce Grok's ability to do this would make the distinction, or care?


It's a computing device you pay to use.


For access to a model that is freely available with fewer guardrails. It’s a tool for people who don’t know what they’re doing.


Do you really think you want live in a world where people can do whatever they want on their computers?


Yes

Because I know the alternative and I've seen the motivations of people trying to tell me what to do "for my own good"


Ok, you do realise people commit crimes using computers yes?


You do realize that crimes are specific to geographic regions, yes? How EXACTLY would you propose building inherent limitations into a general purpose computing device that would account for this?


Who said anything about inherent limitations?


You can drive your car into a crowd of people, and that's illegal everywhere.


Do you really think you want to live in a world where people disagree with you?

Not saying that cyber-crime is a good thing, just trying to illustrate that we shouldn't be gauging what should be possible by what you want and don't want.


What? I’m delighted to live in a world where people disagree with me. It would be tremendously dull otherwise.

We design the world around what large majorities of “reasonable” people want all the time.

I don’t understand what point you’re trying to make here at all.


Neat! Always hated how the models were neutered at the start.


The model seems to just ve Flux, but yes Flux is a lot less neutered than recent model releases (ahem, poor dead SD3).

It’s pretty amazing really. I made my company mascot doing a bunch of funny things and it nailed it almost every time.


SD3 is not neutered, it’s just much, much smaller than flux. And it’s smaller than SDXL. Model size is king. Imo releasing it was a mistake.

People have stirred them into believing conspiracy theories about SD3 when the simplest explanation is that smaller models are just clearly worse.


It's abundantly clear that these companies are restricting what can be generated, sometimes to hilarity. Bing's image generator will straight up tell you this.


You are conflating big tech companies that run models for you and companies that provide you with models to run.

SD whatever doesn’t know what porn is, but it is trivially fine tuned to churn out porn every day.


It's a distinction without a difference. Most people don't have the hardware to run locally.


You can rent hardware if you really need to and still run the models yourself, but you just need a basic gaming rig more or less


Remind me how large the 1.5 models are.

I agree, SD3 should not have been released.


It’s fully believable that the architecture behind SD3 requires more parameters to perform at the level of 1.5 though. SD3 does genuinely do “better” than 1.5 by many metrics it’s just that it is simultaneously really bad.


The X.ai announcement page states “At the time of this blog post, it is outperforming both Claude 3.5 Sonnet and GPT-4-Turbo.”

But that is against one benchmark. Has anyone tested it for coding tasks and if so, what is your experience?


> computer does as instructed

Good! Too bad it says it has some guardrails


Make a picture of a Tesla killing a pedestrian...


Original title: "X’s new AI image generator will make anything from Taylor Swift in lingerie to Kamala Harris with a gun"

Unfortunately, HN has title length limit, otherwise this submission would probably find its place in flagkilled dustbin. It is utterly disgusting that theverge only sees and promotes unethical usage for image generators. With cropped title I expected to see something actually good/inspiring, instead I got low-quality journalism.


Wrapper on top of wrapper by the looks of it. Using flux model.


an indian woman standing naked


While these images are stylized and would never be interpreted as real, we are just rapidly entering a world where photos should not be trusted. Or at least they shouldn't be trusted anymore than a sketch.

It's not the end of the world--no one gets confused by a political cartoon showing political opponent X dressed as legacy enemy Y (e.g., Trump as Hitler). I think the fuss about safety here is overblown once we more widely accept that a 'picture perfect' rendering of an event is as unreliable as a sketch of an event.

It may even be a positive: it will give us all plausible deniability if real but embarrassing photos get released.


> It may even be a positive: it will give us all plausible deniability if real but embarrassing photos get released.

Honestly i don't think it will. Rather it will make that problem worse. Only when everyone has nudes will it feel less embarrassing to see your AI-self in explicit situations being passed around.

Basically it'll have to be so common that people don't care to pass it around.


I thought Photoshop already did that.


Even before Photoshop, photographs could be altered and airbrushed. Even so, faking a convincingly realistic image in Photoshop takes effort. Generating a plausibly realistic image using Stable Diffusion requires almost no effort whatsoever, just access to a GPU. This is a significant change.


"we are just rapidly entering a world where photos should not be trusted"

That train departed years ago.


If push comes to shove, we already have all the cryptographic primitives needed to produce and disseminate trustable imagery.

However, society at large doesn't actually care.


I don't think that cryptography can solve the problem of trustable imagery. What sort of system can do that?


Cryptography lets you certify attestation. So an organization can certify that an image is legitimate, and you can cryptographically verify that this organization has indeed certified that this is a legit image.

That's it. It doesn't verify that the image itself is real, only that some organization has put their stamp of approval on the image. You can verify that the image wasn't tampered with between when they approved of it and it got to you, but you can't verify that the image was real to begin with.

On principle, you could build a chain of certification into the camera itself but this strikes me as a losing battle because you could just stage whatever you want in front of the camera.

The impetus then is on organizations to build that trust.


My question is how would those stamps exist in the first place? Is the idea that Canon or w/e will ship their physical cameras with keys, and sign those images with the keys. Now when you go to look at an image it'll be verified Canon, or w/e.

In that world wouldn't keys leak pretty easily? The key exists on the device. Is there a way this sort of stuff is actually viable? Or do i have the model entirely wrong?


You're right, hardware keys will leak. The idea is not to put trust into hardware, but to put trust into organizations.

Some photographer gives images to a news organization, and they take the photographer's word for it that these images are real before they sign and redistribute them. They trust the reporter, and you trust the organization. Or if the reporter has built enough credibility, they can vouch for the images themselves and you can trust them directly.

Cryptography allows you to verify that this reporter or organization attests that the images are what they claim. It doesn't allow you to verify whether the organization is worthy of your trust.

Either way, the system relies on being able to trust people, not things.


> The impetus then is on organizations to build that trust.

And that's where it all falls apart.


The Camera manufacturers are putting DRM and TPMs on camera to sign the images.

It will be broken the minute the launch, but that won’t stop them from trying.


Yep cryptography solves trust, that’s why we don’t need CAs.


> from Barack Obama doing cocaine to Donald Trump with a pregnant woman who (vaguely) resembles Kamala Harris to Trump and Harris pointing guns. With US elections approaching and X already under scrutiny from regulators in Europe, it’s a recipe for a new fight over the risks of generative AI.

So the journalists are saying that they are the moral judge of what's right and what's wrong and they can decide which information should be suppressed? Coming from China and knowing such much history about how communist countries treated their people, I'm deeply suspicious of these journalists. Just look at China, look at Cambodia, look at Cuba, Look at Romania, and of course look at Germany. When didn't it start with elites advocating for suppressing speeches for the sake of high moral ground?


Title is shortened. It will not generate anything. Restrictions still in place.

Grok will tell you it has guardrails if you ask it something like “what are your limitations on image generation?” Among other things, it promised us:

> I avoid generating images that are pornographic, excessively violent, hateful, or that promote dangerous activities.

> I’m cautious about creating images that might infringe on existing copyrights or trademarks. This includes well-known characters, logos, or any content that could be considered intellectual property without a transformative element.

> I won’t generate images that could be used to deceive or harm others, like deepfakes intended to mislead, or images that could lead to real-world harm.


> Grok will tell you it has guardrails if you ask it something like “what are your limitations on image generation?”

Making leading questions to an LLM is a sure way to have it hallucinate. The only way to test the capabilities of a model is to try them out.


As with the likely false claim about a DDoS attack during a recent political rally on X [1] the claim of "guardrails" from this group warrants suspicion. Could be similar to Tesla FSD, where the philosophy is 'we tried to make it safe but we're actually still testing and yes injuries will occur.'

[1] https://mashable.com/article/elon-musk-donald-trumo-x-spaces...


The article claims these are just AI hallucinations and not actual rules. It will rephrase them and change the rules if you ask it different ways.

You’d know that if you even glanced at the article.


At least going by the examples provided, those might be claimed guardrails, but don’t look to actually be enforced.


That's likely a hallucination. They report on their experiment breaking most of those "restrictions" except

> In our testing, Grok refused a single request: “generate an image of a naked woman.”


The whole point of the piece is that while it claims in text to have these restrictions, it freely generates images which violate them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: