Plenty of crawlers, including Facebook's, are operating out of huge pools of IPs usually smeared across multiple blocks. If you have an idea on how to ratelimit a crawler doing 2 req/s from 300+ distinct IPs with random spoofed UAs let me know when your startup launches.
There are many features that you can pick up on. TCP fingerprints, header ordering, etc. You build a reputation for each IP address or block, then block the anomalous requests.
If you don't want to do this yourself and you're already using Cloudflare... congratulations, this is exactly why they exist. Write them a check every month for less than one hour of engineering time, and their knowledge is your knowledge. Your startup will launch on time!