I agree with half of that. The half that says that most exceptions thrown from l...

rmanolis · on Dec 6, 2024

No, you don't need to just log and return 500, you can make the software to handle these kind of errors.

You could make the software call the system administrator and return a message to the user "Try again in an hour, the system administrator is fixing it now"

Or if it is a timeout, the software will call amazon to buy a new machine to scale the database and send a message to the user "Try again in an hour until we scale the system".

Developer's job is to automate error handlers and not be the error handlers.

If you can see the stack trace tree, then you can plan far ahead, but to do that we need to destroy this "agile" mindset, that is always in a hurry and doesn't let you to think that far ahead.

tharkun__ · on Dec 6, 2024

    you can make the software to handle these kind of errors.

No you can not do this in all cases and I described multiple cases in my comment already in which you can't reasonably do anything automatic. Let me explain.

    You could make the software call the system administrator

But why would I do that for every `StatementTimeoutException` at every moment of the night and why do I need to bake that into my error handling? That isn't actually handling the error.

    "Try again in an hour, the system administrator is fixing it now"

Please never ever do this to either your users, who may believe it or you "system administrators" who do value their sleep or have other more pressing matters to attend to.

    Or if it is a timeout, the software will call amazon to buy a new machine to scale the database

I described a case in which automatically adding resources would actually be wrong in that it would completely mask the actual problem, which is that the developer did not think about the access patterns of the software they wrote enough and did not add the right index to the database. If you keep scaling your database automagically you'll pay AWS until you run out of money and have not solved anything. Believe me, a missing index can eat up a lot of resources before anything gets better. And until then your software will just keep failing and keep adding resources. And in this case it won't help at all because your new machine will not even be used by the query. It'll still only be one node handling your read and that is constrained by actually reading data from the disk and that's super slow because you forgot the index!

    Developer's job is to automate error handlers and not be the error handlers.

I've yet to see anything that the developer should do here based on an individual error in the software. One case where something should happen automatically on the database would be if the database was running out of space. You should have monitoring in place that takes care of that. And no it should not be your software doing that scaling because it received a `SQLError -> DatabaseOutOfDiskSpace` error. If it gets that far, all of your calls to the database will fail. Which of your error handlers should be the one handling it and why should we let things get that far in the first place? Have monitoring set up outside of your actual software that adds disk space automatically before you run out of space and then make it scream very loudly to your system administrators and developers about it, so that they can take a look at it and determine if this was a legitimate "well I guess we got too many more paying customers now, this was OK" or if it was the last update that went out having a bug that keeps filling up the database with BS data and you need to make an emergency bug fix or maybe you're deliberately being DDoS'd and it got past your DDoS protection and you better do something about that or the DDoS'er is gonna make your AWS bill go crazy.

    If you can see the stack trace tree

Again, nothing and nobody cares about up-stack. I care about down-stack when handling such errors.

    we need to destroy this "agile" mindset, that is always in a hurry and doesn't let you to think that far ahead.

Nothing to do with agile at all. You should think about error handling every time you code anything. For example as we've seen above, think ahead about your database running out of disk space. But don't make every developer think about it for every single piece of code they ever write. That makes no sense to have them try to handle these situations. It'd actually make it worse and these people would never get any actual work done either.

rmanolis · on Dec 7, 2024

System administrators are paid to wake up at night to fix things.

Ask your boss, "do you want your users wait until the next morning for the system administrator to wake up and fix the issue? Or do you want the software inform the user how long it will take to fix the issue in seconds and start calling the system administrator to wake up and fix it?"

Because you answer based on your preferences as a worker who wants to avoid the extra work to make the system perfect and not your boss's preferences.

tharkun__ · on Dec 7, 2024

I guess that settles it. You are conveniently ignoring the cases I described in which it makes absolutely zero sense whatsoever for your software to do anything about this.

And yes, there's an on-call person that does get paged when something happens that likely needs immediate attention. A page for every single time there's any error? Not bloody likely mate.

To pick up your last point: My boss is not in the business of paying for you, who will spend countless extra days building useless error "handling" for stuff that has already been handled and who is trying to get out of the responsibility of writing resilient software by paging someone else to "pick up the tab".

A "boss" never wants to pay for a perfect system. That would take way too long and nobody has figured out how to actually build that anyway (no, Odin is not the answer). They want to pay for the "slightly less than good enough" system, because that's cheaper and still gets the job done. And especially when I hear you talk here, I'm with them: Perfect is the enemy of good enough. We just have to ensure that it really is good enough and not less (coz many a boss will happily take way less than good enough if it gets them to market faster.