While I’ve observed PyTorch running faster for my (convolutional-based) research...

0xbear · on Aug 14, 2017

We just plain can't do data augmentation quickly enough with TF. Queues-schmeyes, doesn't matter. Still tops out at about 35MB/sec on MS COCO and starves even a single Titan Xp. On the same hardware, with the same data augmentation steps, PyTorch gets ~50MB/s or so and saturates the GPU, since it never has to wait for data. In fact it can even read faster than that, and automatically parallelize the forward pass across several GPUs. You do still retain full control over placement, however. Super slick.