This is a harder problem than you think it is. This is a classic reinforcement learning problem, and it is incredibly difficult in the real world. It's easy in discrete state-space, perfect information, turn-based games like Go (the recent advance there). Very difficult in robotics.