Thanks for the great feedback! I didn't know OR-Tools CP-SAT worked best with at least 8 threads; I'll definitely try that out.
In my local tests, Chuffed is often faster on the simpler instances. However, I've noticed on the harder puzzles, Chuffed can take over 10 minutes, while OR-Tools (with just 4 threads) solves them in about 5.
You're also right about the records. I started with tuples for simplicity, but as the model grew more complex, it's clear that records would be a much better approach for readability.
One thing that might be interesting (and might be useful for some solvers) is to change from checking if there is a solution and requiring each part to be used in the solution, to find solutions and minimize the number of unused parts.
One additional constraint that you could add that I don't think I've seen is distance to goal. In essence, for each grid position (without considering any tracks) there is a shortest time it would take from that position to get to the final position. If a train is in (a, b) and would take at least time t to get to the exit, there is no way to solve the problem if it is in (a, b) at time MAX_TIME-t. Might be useful for some solvers, worth testing.
Finally, adding some simple heuristic might help some solvers (I'm thinking of Gecode in particular here). My initial guess would be to add an interleaving over time steps of the train_row, train_col, and train dir for each train and search in order. That would as a basic heuristic represent a search from the start snaking out. Value choice is trickier to specify in MiniZinc of course.
The "distance to goal" constraint is a clever idea, though it's tricky to implement because the tunnels act as teleporters, so a simple distance metric doesn't apply. I'd likely need to pre-calculate a shortest-path graph first.
I'm benchmarking Huub and Pumpkin now. My main goal is raw performance, as a friend's imperative brute-force solver (https://github.com/FoxtrotOnce/RailboundSolver) solves most levels in under 2 seconds. It's a high bar to meet with a declarative model, but I'm keen to see how close I can get with these optimizations.
In my local tests, Chuffed is often faster on the simpler instances. However, I've noticed on the harder puzzles, Chuffed can take over 10 minutes, while OR-Tools (with just 4 threads) solves them in about 5.
You're also right about the records. I started with tuples for simplicity, but as the model grew more complex, it's clear that records would be a much better approach for readability.