Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I like this a lot and it’s well worth reading over.

For a lot of programming tasks, it’s best to consider these kind of lists not as falsehoods but as edge cases or assumptions that should be made explicit.

Handling every “falsehood” gracefully yields a combinatoric explosion in the complexity of your data model. Sometimes it’s the right or ethical thing to do it anyway, but other times it’s by far easiest to document the assumption and move on.



> Handling every “falsehood” gracefully yields a combinatoric explosion in the complexity of your data model.

Sometimes this only happens if you started off with a model that made too many assumptions in the first place. For instance, regarding the "Falsehoods programmers believe about names" list, a naïve American programmer might just go with a simple pair of fields for the first and last name, running into many of those falsehoods, and if they stick with that approach, they'll have a mountain of edge cases to (potentially) handle on their hands. However, if you instead loosen up the format and just provide a single field asking something like "What name would you prefer we use when referring to you?", you avoid almost all of the "edge cases" in that list (many are so widespread that they can hardly be called edge cases) and will end up with an implementation that is even simpler than the one the hypothetical naïve programmer would have provided. Of course, this might cause some difficulty if you need to interact with external systems that require a specific format for a name, but at that point the issue is that someone else has made similar assumptions, not an inherent difficulty associated with using a less constrained format.


Often the secret is not so much to 'handle every "falsehood" gracefully', but rather to simply not impose prior cultural assumptions.

Take the names for example. Spanish naming conventions are "(given name){1,n} (first surname) (second surname)". For instance the full name of current acting Prime Minister of Spain is Pedro Sánchez Pérez-Castejón. However, for the overwhelming majority of non-official purposes, Spaniards only refer to the first surname, which in this case is Pedro Sánchez.

Now, the problem arises when Spaniards make use of software (or businesses, or bureaucracies) that have been designed based on the English convention of "(given name) (middle name)? (surname)" when parsing their full name.

If Spain's PM is asked to fill in his first name and then last name, and the system was designed with an Anglophone expectation as to how to display his name (e.g. a credit card) guess what it will be? Hello MR PEDRO S PEREZ-CASTEJON! [0]. Or maybe it's an account for an online service which generates an avatar based on "his initials". Will they be PS or PSP as he expects? No. They are, much to his chagrin, PP [1]... and there's no option to change them [2].

And if he ever has to provide his "last name" to log in to something (e.g. magazine subscription [3] or UK Council Tax authority [4]) he will be shocked with frustration at the fact that the system not only refuses to recognise the last name that he provided in the surname text box as written ("Sánchez Pérez-Castejón"), but also won't get it in pure ASCII ("Sanchez Perez-Castejon") nor just his first surname ("Sánchez" or "Sanchez")!

It is only when he finally asks his half-Spanish half-British friend for help that he's finally told the answer: "Try login in just giving 'Pérez-Castejón' as your last name" which he does successfully with a sigh of relief.

What I'm getting at, is that it doesn't take additional logic for a system to be compatible with both Angloamerican naming conventions and Spanish naming conventions—it takes less logic! All that is needed is for the system to either accept the customer data as given rather than make assumptions as to the meaning of space-separated words, or ask for the customer's preference for display. If the customer said their last name is two-words-long then that is their last name. If you need to display the full name, display it as provided. If you must use initials, let the user choose them.

[0][2][3][4] Based on true stories, only the names of the characters have been changed.

[1] "PP" is the main opposition party to the acting Prime Minister.


Indeed. Or, more generally, don't over-define your data model. Keep it as simple as possible (but no simpler.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: