I spoke too soon. The deploy will bring up an instance instead of your first request. To force a cold start, you could set concurrency to '1' and send two concurrent requests. You should see a log entry such as the following when a new instance starts up:
"This request caused a new container instance to be started and may thus take longer and use more CPU than a typical request."
Alternatively, you could set up an endpoint that shuts down the server (which will shut down the instance - not advised for production code).
As an aside, the "K_REVISION" environment variable is set to the current revision. You can log or return this value to test whether traffic has migrated to a new version (instead of waiting a minute).
I'd encourage you to test your particular app, but you should expect similar cold start times in Cloud Run.
You can set "Maximum Requests per Container" on container deployment so you are in control whether a container has single concurrency (i.e. "Maximum Requests per Container = 1"). If your app is not CPU-bound and you allow multiple concurrent requests (the default) you should see fewer cold starts.
Thanks very much! Could you answer the following questions about cold start times in Cloud Run or point me to a good resource:
1. I think I have a pretty good understanding of what's going on with the lifecycle of Cloud Functions that leads to the cold start times. What happens with Cloud Run? Does it need to download the whole Docker image to a machine to run it? Seems like that would take longer.
2. App Engine has 'warmup requests', which I think are great. Is there any equivalent on Cloud Run, or plan to add?
3. Is the time that an instance is kept warm during idle similar between Cloud Functions and Cloud Run?
1. Both cases grab the image and run it. Better per-layer caching (including very aggressive caching of common layers) is coming soon, so stay tuned.
2. No current equivalent, though there are thoughts on exposing more scaling control knobs (e.g. max-instances, min-instances). Max is easy, min is harder because of the cost implications. GAE was billed on "instance hours" but Run is CPU time, so if you go "min-instances=1" you're paying for a VM. Something like Run on GKE (where you're already paying for the compute) probably makes more sense to expose these controls.
3. Yes, though since Run can be multi-concurrent, for certain (most?) load profiles, you're going to have way fewer cold starts because the instance is already handling requests.
As an aside, the "K_REVISION" environment variable is set to the current revision. You can log or return this value to test whether traffic has migrated to a new version (instead of waiting a minute).