Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What he meant is you overfitted the network with video footage. There is no game, just seemingly clever stitching and playback of learned footage

similar concept applied to animations and implemented in a state machine https://www.youtube.com/watch?v=KSTn3ePDt50

and optimized with nn https://www.youtube.com/watch?v=16CHDQK4W5k



We had ~100GB of data (and that was gzip compressed data). The final model is 173MB.

It's simply not large enough to have memorized every combo.


Don't think gzipping helped much with video data?


The first link provided seems to need a very detailed human-provided cost function for specific development needs.

The second one is indeed interesting research and seems to be a combination of the prior learned motion mapping working in tandem with a generative model.

I suppose you could say that the automation of the dataset is considered as "augmentation"; but the difference here is that the dataset is just pixels and inputs rather than all that animation info and simulation data. Yes, a simulation is running; but the GAN only gets the pixels and the input.

There's a similarity there though; you're right. In either case; the explicit goal of the video you posted is to combat runtime constraints of generative models. I'm not certain it's a fair comparison.

The latter video and sentdex's result both seem to generalize to unique scenarios not present in the training set. This may mean they are creating an efficient representation of the underlying data in order to predict future samples more easily than simply overfitting.

The top level comment here is a shallow dismissal and Randomoneh could have answered these questions themselves before throwing out a smug comment like "I fail to see novelty here" when it's at the very least the first large-scale GAN successfully trained on GTA V.


The first link exposes the trick employed by your model.

>animation info and simulation data

but did your model learn any of that?

>explicit goal of the video you posted is to combat runtime constraints

The trick to motion mapping is feeding a lot of data with accompanying inputs to build an atlas you can reference during playback.

>first large-scale GAN successfully trained on GTA V

Its really cool. The problem I had is in the presentation. I immediately felt insincerity bordering on scamming the audience, because I assume someone working in this field would know how the sausage is made. From the YT clip: "the shadow and reflection works", "modeling of physics works". Do they? or did your model build an atlas of video frames it can play back according to the fed input? Im guessing weather/time of day was locked when recording training data - perfect shadow and constant sun position for a nice reflection. Searching for 1:1 matches of generated output in the training set would be interesting and pretty revealing.


> I immediately felt insincerity bordering on scamming the audience

MFW I read this. Jeez man. Model size is 173MB. It didn't just memorize every possible combo.

How the hell you went from our excitement about a fun project we shared on YT to accusing us of "scamming" the audience I really don't know. What a terribly rude and hateful attitude you have =/


Don't take it personal. Commenters on HN are famous for dismissing successful ideas (remember Dropbox?).

I have one question: you mentioned that the training data was 100GB. Was it the same resolution as what is output by the model (ignoring supersampling)?


The people on this website are terrible sometimes.


I wouldn't call it scamming, but 173MB is not small at all. At the resolution of this model, you can easily fit the entire Titanic movie in 173MB. Maybe even have enough space for audio.

Furthermore no one is saying the model "memorized every possible combo". However imagine you have a set of keyframes (maybe even multiple fragments per frame) and you need to interpolate between them? Not that hard of a task, isn't it.

Models don't care about simulating our "intention" properly. They care about fitting the input in the simplest way possible. Think about a model like a lazy worker merely trying to look like it's working.

None of this makes NN less exciting, but it should inform us you can't go 0 to 60 in one step and hope the NN would have great insight about what it's doing.

We need models that make smaller conceptual jumps, i.e. models that understand 3D space, then models which understand transformations in 3D space, then models which understand citicscape, etc. etc.


It sounds like you and others are trying to clarify how this demo doesn't live up to your idealized, subjective expectations. Noone is claiming this to be a revolutionizing or even useful video game engine.

It's a neural network that recreates a limited, yet fully dynamic gameplay segment only based on player input. It's a really neat and fun project.


I think it's quite telling that you point to me about having idealized, subjective expectations and then describe the demo as "limited yet fully dynamic gameplay". It rotates the car to left or right depending on whether you press left or right.

It's super-interesting but it doesn't recreate limited fully dynamic gameplay. It doesn't recreate any sort of dynamic gameplay. That's your idealized, subjective interpretation.


The driving seems pretty dynamic to me. Maybe "fully" was a bit hyperbolic, as I can't really justify or quantify what that would entail. On the other hand, saying that it's not dynamic at all seems equally misguided. Also you seem to disregard the "limited" and "segment" qualifiers which was there for a reason.


> However imagine you have a set of keyframes (maybe even multiple fragments per frame) and you need to interpolate between them? Not that hard of a task, isn't it.

Intrestingly, the video artifacts of this model look somewhat similar to those from simple motion interpolation algorithms such as ffmpeg's minterpolate, especially during fast camera motion. https://ffmpeg.org/ffmpeg-filters.html#minterpolate

Edit: I generated an example with strong artifacts. Input: https://mscharrer.net/tmp/lowfps.webm Output: https://mscharrer.net/tmp/minterpolate.webm


Memorizing a static succession of frames with nothing actually being dynamic and interactive isn't the same challenge as this.


Accusations of scamming are serious. What evidence do you have? None as far as I can see. This is wrong and should be remedied.


I feel scammed when practitioner of the art tries to sell me on his model "learning physics of the simulation. Look, it even figured out where to put the shadow".


No one cares how you feel, come with proof before accusations. Otherwise you are just a troll


Have you seen the video? The author even goes as far as suggesting the technique might useful for (generating?) entire operating systems at https://www.youtube.com/watch?v=udPY5rQVoW0&t=853s. That's just wild.


No, that's just false. How about a direct quote?

I suggested there could be a "future where many game engines are entirely or even mostly AI based like this. Or even things like operating system or other programs."

The thought here was just a wondering of what the future might be and if we might have far more AI based programs.

I still think the answer is a strong yes, this is a glimpse into the future. No where did I say GameGAN would be that engine. You're just trying your hardest to hate.


I'd like my OS being deterministic, thank you.

> You're just trying your hardest to hate.

Manipulative much? I don't hate you (well, so far), you aren't being attacked, I'm just noting what a few informed people here don't like about your video. No, they aren't trolls. And, yes, everyone has different level of tolerance to exaggerations, of course.


Odd, pretty sure it was you who misrepresented what I said in attempts to manipulate.

You were also the one who "exaggerat[ed]" my claims. I made a general statement about my thoughts about future AI-based software rather than human-coded.

I still think that's indeed the inevitable future. Doesn't seem like it's remotely outrageous or an exaggerated. I never said GameGAN would be that software, but you seem to want to make that be the case so you can put it down.

What makes you believe neural networks aren't or could not be deterministic? What makes you think NNs could not eventually produce far more robust, reliable, and secure operating systems?

Seems obvious to me, but I guess you're more informed than me :)


You, like many youtubers, made completely exaggerated claims in your commentary. Your model fits a sequence of inputs to a video frame. But you say "wow look it even models the movement of the sun!". It's pretty absurd.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: