We had the problem you're describing for awhile, but have since figured out how to avoid processes going down and interrupting other reqs. Essentially, you attach a global domain, and when that domain catches an error you stop accepting new connections (obviously you have to be load-balancing between procs) and start a countdown. Some reasonable amount of time later (I think we wait 30 seconds?) you assume that any in-progress request is done and restart the process. We've found this to be very successful.
We do this too, attaching req and res objects to a domain, as well as databases and other network related objects (like smtp clients, etc). This is a huge improvement, but I'm still seeing occasional uncaught error events in our logs on a very large codebase and only in production. Some of them are just ECONNRESET events with no details given, so their origins are REALLY hard to track down. Have you got some magic for catching everything without explicitly having to find all objects that could be emitting? I'd love to hear it if so...
As soon as possible during startup, create a domain and enter it. Because entered domains form a stack, this will be a fallback if an error occurs at a place that isn't covered by any other domains.