Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I agree with half of that. The half that says that most exceptions thrown from libraries today and in much of application code as well are way too generic and hide the details that might allow handling them in a `message` String.

However, that's still about the exceptions thrown from down thread, not from the call path part of the "stack trace".

I.e. your situation would never happen.

    Authentication_Filter_Error -> SQL_Verify_Account_Error -> SQL_Error.Closed_Conn
This stack / call path is impossible, because when the AuthenticationFilter notices that the token is invalid, it returns a 401 or 403 or whatever is appropriate and my REST resource is never actually called. There's no SQL being run and very definitely no "connection closed" error occurred.

But let's say there was a distinction made with proper exception types and instead of `SQLException("Connection closed")` and `SQLException("Statement timeout")`, I actually received `SQLException(ConnectionClosedException())` vs. `SQLStatementTimeoutException`. Now, without string parsing, I can know that either the connection just closed or that the statement was aborted due to timeout. If these are checked exceptions, I have to declare that I'm aware these can happen and what I want to do with them: Handle or rethrow.

However, a myriad of such exceptions can happen. I would probably have to declare 20-50 exceptions way up in a REST resource layer. Not only can these two happen, but many other situations on the network or database side and on the JSON parsing side for the payload I receive, some exceptions from my business logic etc.

And for most of these, what can I do? If the connection to the database closed, all I can do is to log the error and return a `500 Internal Server Error` to my caller. Guess what I can do when a statement timeout occurs? I log the error and return a `500 Internal Server Error` to my caller. For a statement timeout I can't even return a `400 Bad Request`, because it's not knowable if the statement timeout occurred because the database was simply overloaded in that moment or if the request itself was created with such parameters as to always cause a statement timeout. Until we see the logs and through investigation figure out that it wasn't a bad request after all anyway. We were missing an index and the table finally grew large enough for that to matter.

So yeah, I'm good with `RuntimeException` and handling only very few specific ones ever.

Also nobody will be screaming when the token is invalid and I definitely don't call any system administrator. That's something you as a developer look into. Same with the statement timeout.



No, you don't need to just log and return 500, you can make the software to handle these kind of errors.

You could make the software call the system administrator and return a message to the user "Try again in an hour, the system administrator is fixing it now"

Or if it is a timeout, the software will call amazon to buy a new machine to scale the database and send a message to the user "Try again in an hour until we scale the system".

Developer's job is to automate error handlers and not be the error handlers.

If you can see the stack trace tree, then you can plan far ahead, but to do that we need to destroy this "agile" mindset, that is always in a hurry and doesn't let you to think that far ahead.


    you can make the software to handle these kind of errors.
No you can not do this in all cases and I described multiple cases in my comment already in which you can't reasonably do anything automatic. Let me explain.

    You could make the software call the system administrator
But why would I do that for every `StatementTimeoutException` at every moment of the night and why do I need to bake that into my error handling? That isn't actually handling the error.

    "Try again in an hour, the system administrator is fixing it now"
Please never ever do this to either your users, who may believe it or you "system administrators" who do value their sleep or have other more pressing matters to attend to.

    Or if it is a timeout, the software will call amazon to buy a new machine to scale the database
I described a case in which automatically adding resources would actually be wrong in that it would completely mask the actual problem, which is that the developer did not think about the access patterns of the software they wrote enough and did not add the right index to the database. If you keep scaling your database automagically you'll pay AWS until you run out of money and have not solved anything. Believe me, a missing index can eat up a lot of resources before anything gets better. And until then your software will just keep failing and keep adding resources. And in this case it won't help at all because your new machine will not even be used by the query. It'll still only be one node handling your read and that is constrained by actually reading data from the disk and that's super slow because you forgot the index!

    Developer's job is to automate error handlers and not be the error handlers.
I've yet to see anything that the developer should do here based on an individual error in the software. One case where something should happen automatically on the database would be if the database was running out of space. You should have monitoring in place that takes care of that. And no it should not be your software doing that scaling because it received a `SQLError -> DatabaseOutOfDiskSpace` error. If it gets that far, all of your calls to the database will fail. Which of your error handlers should be the one handling it and why should we let things get that far in the first place? Have monitoring set up outside of your actual software that adds disk space automatically before you run out of space and then make it scream very loudly to your system administrators and developers about it, so that they can take a look at it and determine if this was a legitimate "well I guess we got too many more paying customers now, this was OK" or if it was the last update that went out having a bug that keeps filling up the database with BS data and you need to make an emergency bug fix or maybe you're deliberately being DDoS'd and it got past your DDoS protection and you better do something about that or the DDoS'er is gonna make your AWS bill go crazy.

    If you can see the stack trace tree
Again, nothing and nobody cares about up-stack. I care about down-stack when handling such errors.

    we need to destroy this "agile" mindset, that is always in a hurry and doesn't let you to think that far ahead.
Nothing to do with agile at all. You should think about error handling every time you code anything. For example as we've seen above, think ahead about your database running out of disk space. But don't make every developer think about it for every single piece of code they ever write. That makes no sense to have them try to handle these situations. It'd actually make it worse and these people would never get any actual work done either.


System administrators are paid to wake up at night to fix things.

Ask your boss, "do you want your users wait until the next morning for the system administrator to wake up and fix the issue? Or do you want the software inform the user how long it will take to fix the issue in seconds and start calling the system administrator to wake up and fix it?"

Because you answer based on your preferences as a worker who wants to avoid the extra work to make the system perfect and not your boss's preferences.


I guess that settles it. You are conveniently ignoring the cases I described in which it makes absolutely zero sense whatsoever for your software to do anything about this.

And yes, there's an on-call person that does get paged when something happens that likely needs immediate attention. A page for every single time there's any error? Not bloody likely mate.

To pick up your last point: My boss is not in the business of paying for you, who will spend countless extra days building useless error "handling" for stuff that has already been handled and who is trying to get out of the responsibility of writing resilient software by paging someone else to "pick up the tab".

A "boss" never wants to pay for a perfect system. That would take way too long and nobody has figured out how to actually build that anyway (no, Odin is not the answer). They want to pay for the "slightly less than good enough" system, because that's cheaper and still gets the job done. And especially when I hear you talk here, I'm with them: Perfect is the enemy of good enough. We just have to ensure that it really is good enough and not less (coz many a boss will happily take way less than good enough if it gets them to market faster.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: