I did something similar (http://markolson.github.io/storyboard/) earlier this year using ffmpeg and ImageMagick to generate either GIFs around lines of dialog (like this project), or PDFs where each page is a frame of text or a new scene. Optimizing GIFs is by far the least enjoyable part.
Yahoo and others either rate limit or do temporary IP bans if you access too many pages too quickly. Distributing the tools through a VM distributes the workload in a pretty easy-to-setup way.