Or.. google supplies some kind of local LLM tool which processes your videos before uploaded. You pay for the gpu/electricity costs. Obviously this would need to be done in a way that can't be hacked/manipulated. Might need to be highly integrated with a backend service that manages the analyzed frames from the local machine and verifies hashes/tokens after the video is fully uploaded to YouTube.
I guess it could also be associated with views per time period to optimize better. If the video is interesting, people will share and more views will happen quickly.
People assume that we can scale the capabilities of LLMs indefinitely, I on the other side strongly suspect we are probably getting close to diminishing returns territory.
There's only so much you can do by guessing the next probably token in a stream. We will probably need something else to achieve what people think that will soon be done with LLMs.
Like Elon Musk probably realizing that computer vision is not enough for full self-driving, I expect we will soon reach the limits of what can be done with LLMs.
You could also stagger the moderation to reduce costs. E.g.
Text analysis: 2 views
Audio analysis: 300 views
Frame analysis: 5,000 views
I would be very surprised if even 20% of content uploaded to YouTube passes 300 views.