You are talking about possibility. I have no doubt that it’s possible to manipulate images, sound and videos with CLIs. The question is whether that would be the optimal way.
When I worked with Photoshop, after a while most of the time was spent in a shitty point and click batch instructions editor, it should have been a text file in Vim.