Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Every Line Of Business codebase I've worked on has been the worst "there I fixed it" copypasta spaghetti, and has never made it to the point where "maybe we shouldn't add a parameter to this existing, cleanly abstracted method to handle this new similar-but-distinct use-case" was anywhere near my radar for abstraction.

I would love to have developers where my problem was "maybe you piggybacked on existing code too much, in this case you should've split out your own function".



In contrast, every junior developer I've ever worked with has wanted to abstract too early and often, and been slow to recognise that abstraction has costs too (often far higher over time than is initially obvious).

There are costs to copying code, and costs to abstraction, and there's a balance somewhere in between where the most resilient and flexible code lives. The costs of both are paid later, which makes it very hard to judge when starting out where that balance lies, and hard to assign blame later on when problems manifest. Was it too little abstraction, or too much, or the wrong abstraction?

Note that the article claims that duplication is cheaper than the wrong abstraction. The problem is not abstraction in itself, but that abstraction is very hard to get right and is better done after code has been written and used.


What I run into with juniors is that yes, they want to abstract the new problem, and that's good... But they show disinterest in learning the existing abstractions and the existing problems and how their new code would fit into that. Given that approach, you end up with a million individual "frameworks", each only solving a single specific case of a series of overlapping similar problems.

Because reading code is harder than writing it. And the only thing worse than "there, I fixed it code" is "there, I fixed it with this massive cool new framework I've built".


yes, they want to abstract the new problem, and that's good...

I'm not sure that is good. I started off this way too, but now I like to think carefully about abstractions and avoid introducing them till I'm sure it will not hinder understanding, hide changes/bugs, bury the actual behaviour several layers deep, or worst of all make things hard that should be easy later (the problem in the article).

Building abstractions is world-building; it's adding to the complicated structure other developers (including your future self) have to navigate and keep in their head before they can understand the code. So perhaps because of your second point (that people rarely like other people's abstractions), it's better to keep abstractions simple and limited.


Every failed IT project that I have worked on in the last 20 years (except those where the cause was non-technical such as bad planning/ bad requirements), failed because it used too many layers of abstraction.


Counter: Every failed IT project that I have worked on in the last 20 years had too much code. Code is bad. Delete code mercilessly.

Seriously though, the problem are bad abstractions, not just abstractions. A total lack of abstractions is typically a spaghetti you need to read fully to understand.


When I read these threads I feel like I must be working on another planet to the people commenting in them.

In almost every front-end project I join, there's a positive correlation between number of abstractions and code size. Everything is so "best practice one size fits all"-y that you can usually start with halving the size of the code base by removing 50% of the dependencies.

Even once you've done that, you can usually speed up development by cutting 50% of the remaining dependencies. All the ones where the API surface area is more complex than the bloody implementation. Code is bad, sure, but at least it speaks for itself. A lot of the time the choice is between 3 lines of code that are well written and self explanatory, or 1 line of code that is incomprehensible without trawling through 800 lines of documentation first.

I agree that code size is probably the most important metric for measuring complexity, but it's not an absolute thing. If you're too merciless with culling code, you can easily code yourself into a shitstorm of required context that makes hiring and onboarding impossible. Having said that, I think the problem is mostly contained to using other people's abstractions. I can't think of many times I've walked onto a project and gotten lost in the mud because someone there had coded up something that was too convoluted. Only one springs to mind and it was a back-end system.

And that probably gives some clue as to why nobody can agree on this stuff. I'm guessing different ecosystems lie on different points on the scale, and so different approaches are going to be more successful. I've seen probably 20 projects grind to a halt because of overabstraction, and none because of code duplication. But I'm sure there's other programmers involved in other communities and industries that have seen the opposite. So if we're talking to a faceless crowd on a forum, we're going to give vastly different advice, under the assumption that the people we're talking to are somewhere around the average of everyone we've ever worked with.


I think that it depends on what you define as an abstraction. I think we're often just counting wrong or plainly heavy-handed abstractions here.

There are many abstractions which you can cut and reduce code size, e.g.:

* Complex frameworks which do not fit your case * Overused GoF-style design patterns which have no place in this day and age * Magic ORMs generated with annotation processors * Universal Tool Factory Factory Factories[1]

The thing is, you're usually not replacing these abstraction with plain old code duplication (let alone the dreaded "fixed A here, fixed B there" copypasta). You usually replaced dependencies (frameworks, ill-fitted libraries and factory-factory-factories) with your own implementation which is a better fit for your needs, and that can be viewed as duplication - sure. But you'd usually still only have ONE implementation in the code base.

In short, in most cases I've seen where we eliminated bad abstractions and save on code, we replaced them with good abstractions, not (a lot of) duplication.

[1] https://medium.com/@johnfliu/why-i-hate-frameworks-6af8cbadb...


> there's a positive correlation between number of abstractions and code size

What really matters to me is how much code I need to read to understand how it works. I prefer to work in codebases where I need to read 100 lines out of 1200 instead of 1000 of 1000.

Good abstractions are not about code duplication - they are about simplifying conceptual models - so I don't need to understand these details of everything, but I can still understand the system as a whole. And this is the hard part to get it right.

Many developers often mistake indirection for abstraction. The worst codebases I worked on had plenty of indirection, many components and many dependencies (often cyclic!), but they actually lacked abstraction or mixed different abstraction levels. A component that is so universal that it can render webpages, solve differential equations and wash the dishes all in one function is not a good example of abstraction.


Tnere is truth in this! I've also seen that some of the most successful projects with the highest performers are the most full of duplicate code.

The operating theory is to be first to market in order to capture the largest market share and be the market leader. Programs are just tools that can be rewritten later. That's similar to any large tech company today that "innovates" then apologizes later.


If a company has been running long enough to be making money, chances are their codebase will be crap.


I just had to debug code that had seven layers of classes on top of dapper to call a stored procedure in SQL server.


So much this. I've encountered many codebases (in science and in tech) where the coder did not even use basic abstractions. In one case there was a lot of

    plot('graph1')
    plot('graph2')
    ....
    plot('graph100')
because somebody didn't know how to create strings at runtime in C++. Another codebase did complex vector calculations in components, I was able to reduce a 500 lines function to 50 lines (including comments, and with bugs fixed).

I can sympathize with this a bit, I started programming with BASIC - you could not return structs, you could not use indirect variables (no pointers/references)... but at least you had the FOR loop :-P

People get often called out for over abstracting (rightly so), but I've rarely seen somebody critisized for copypasta or for overly stupid code. Probably because we're too accidentially afraid to imply somebody can't code.


This comes up very often and is probably a big part of the distaste many people have for jQuery. You see so much copypasta $(selector) that queries the entire DOM over and over again instead of storing the intial query in a selector, querying children based on a ParentNode, etc.. This duplication is wasteful at best, and can hurt performance at worst.

But as others noted, this is usually the sign that the creator is either green, or puts little focus in furthering their programming because they normally do other things--not malice or carelessness.


I saw a post on here recently about the “proportionality of code” (I think this was the term used) - as in, how much one line of code translates to in terms of work for the machine. Python was used as an example, in contrast with Go (list comprehensions vs Go’s verbose syntax).

I think a similar line of thinking is applicable here. $ hides a lot of work behind short syntax. The syntax isn’t “proportional” to the work. Not only that, but the amount of work depends on the argument. Perhaps it’s better that we’re forced to put the effort in and type out “document.getElementById” - it makes us think about what we’re doing.


> I've rarely seen somebody critisized for copypasta or for overly stupid code.

Do you think that is in the realm of what the article is concerned with?


Code like you describe is of often the result when a program is written by someone that does not have programming as their main profession. I have seen code like you describe in code written by scientists (in other disciplines than computer science).

They may have very deep knowledge in their field, and have written a program so solve some problem they have, but are unfortunately not very good programmers. This often results in quite naive code that still try to solve an advanced problem.

In code written by professional programmers, I have seen the pattern described in the article far more often than the naive style you describe. After all, programmers are trained to avoid duplication and finding abstractions, and will often add one abstraction too much rather than one too little.


> but I've rarely seen somebody critisized for copypasta or for overly stupid code. Probably because we're too accidentially afraid to imply somebody can't code.

It's because it's a far more benign problem than too much abstraction.

Sure it's easy to poke fun at that code and lol at how the programmer can't even use the most basic kind of abstraction, but that code is still clear and easy to read. More importantly, it is trivial to fix that kind of error.

I would take code like this any day over code written by an experienced programmer too keen on abstraction.


    plot('graph1')
    plot('graph2')
    ....
    plot('graph100')

I've done a lot of that myself. What you might not be seeing is the for loop in a scripting language that was used to generate that text. It probably took less effort than looking up and implementing it the "right" way. It might make your eyes bleed but if you need to change "plot" to another function, that's just a find-and-replace-all away. Most importantly, the code works fine and doesn't actually need abstraction.


Yes, writing a for loop in another language to generate code instead of just writing the same loop in the language you're already using? Common technique, nothing wrong with it whatsoever.


Yes, a lot of scientists use their computers in ways that horrify software developers. For example, learning exactly enough of a compiled language to do some wicked fast integer / floating point arithmetic, and not bothering to waste time on the mundane crap you find obvious. And that might mean falling back to a familiar language that makes string formatting easy.

If it ain't broke, don't fix it.


> If it ain't broke, don't fix it.

But scientific programming is deeply broken. Code presented along with publications often doesn't work, or is an incomplete subpart/toy example that's supposed to be invoked within some larger framework. That sounds great until you realize that "some larger framework" doesn't refer to a standardized tool, but some deeply customized setup (a la the one you're responding to, that uses e.g. ad hoc code generators across two--or sometimes more--languages because the original authors didn't know how to format a string in one of them).

Even if you do get lucky enough to find a paper with all requisite code included, in many cases it was only ever invoked on extensively customized, hand-configured environments. And that configuration was done by non tech folks with a "just get it to where I can run the damn simulation" attitude, so configs are neither documented nor automated. And when I say configs, I'm talking about vital stuff--e.g. env vars that control whether real arithmetic or floating point is used.

Often as not, you hack your way to try to get something--anything--running, and it either fails catastrophically or produces the wrong result. Now you have to figure out which of several situations you're in: is the research bad? Were the authors just so non-technical they accidentally omitted a vital piece of code? Was the omission deliberate and profit-motivated (e.g. the PI behind the paper plans on patenting some of the software at some point, so didn't want to publish a special sauce)? Was the omission deliberate and shame-motivated (i.e. researchers didn't want to publish their insane pile of hacks written to backfill an incomplete understanding of the tools being used)? Is it an environment-dependent thing?

And all of that is just as pertains to code in published work--usually the higher-quality stuff. Assuming ownership of in-house code from other scientific programmers is much, much worse.

This isn't abstract moaning about best practices. The failure of labs, companies, publications, and universities to combat this phenomenon has direct, significant, and negative effects on the quality of research and scientific advancement in many fields.

TL;dr it is "broke". When programmers complain about reproducibility crises in soft-science fields, they're throwing rocks from glass houses.


You're bringing in a whole host of issues inapplicable to the snippet OC found questionable. Don't disagree with ya, but "lack of obvious abstraction" isn't one of these "extreme sensitivity to environment vars" cases.

In fact, vociferously complaining about such cases is a great way to turn scientists away from code review as a concept. Fold the code away in your head (or edit your local copy), and dig for subtle issues like numerical sensitivity, environment, etc. That's the way to bring actual value to the process.

For the code in question, "oh by the way this can be done simpler", with the simplified snippet, is an appropriate approach to the review. But in my experience it's best to save your breath for actual problems.


> the code works fine and doesn't actually need abstraction

Well, maybe it works fine. We didn't see the other 97 lines to verify that they actually include all the integers from 3-99 without skipping or duplicating any. (NB with a loop this verification would be trivial.)


Maybe they deleted 57 because it triggers an edge case. Put it back if you dare. ;)

(no, that's the bad kind of tech debt that's unfortunately common and I actually hate)


This is fine for code that belongs in the trash, ie. just testing stuff, prototypes, debugging, learning the language/framework, etc.


The business codebase I'm working on now was written by OOP crazy people who thought inheritance was the solution to every line of duplicated code. When they hit roadblocks, they filled the base class with things like if(this.GetType() == typeof(DerivedClass1)){...

I would do anything to have the duplication instead.


If you're truly OOP crazy you will always find ways to avoid resorting to branching on types or even avoid branching altogether (just on the language level of course). "There's a design pattern for that" :-)


Once you ask what the class is you're no longer even "OOP crazy".

You've just capitulated to the complexity and do whatever it takes.

I don't want to sound (too) condescending. I know how easy the best intentions can lead a project there. This job is hard.


Checking for the type is the exact opposite of OO.

The correct OO would be to think about what the check represent, maybe abstract it in a base interface with pure abstract methods and derive from that interface.

What you describe is what people without understanding of OO do when they come from a language without OO.


Very relatable. And they even have the guts to call this code "SOLID"


Then the very same people learn that inheritance bad, composition good, and they'll create abstractions with no meaning on their own, which call 10 vague other abstractions (but hey, no inheritance!). Figuring out what happens there is even worse than with inheritance. Some people grow out of it, fortunately (mostly after having to deal with shit like that once or twice).


> ...they'll create abstractions with no meaning on their own...

As if that doesn't happen with inheritance!

The dark pattern is using inheritance as an alternatve way of implementing composition. Anyone who thinks that "inheritance bad, composition good" is the proper response to this is probably as confused about the issue as those making the mistake in the first place.

To be clear, you are clearly not making that claim yourself, but you are invoking it to make a straw man argument.


> they filled the base class with things like if(this.GetType() == typeof(DerivedClass1)){

That defeats the purpose of polymorphism.


wow, just reading that term "line of business" makes me anxious. I used to work on a global payments platform that supported "multiple LOBs", and it was a nightmare of ifs and switch statements all the way down. The situation was made more difficult by the fact that our org couldn't standardize the LOBs into a common enum.


Nothing I hate more than seeing two files or more, sharing 90% of the same code. No matter what justification one attempts to use, there's a mistake somewhere in the design / development process.

I can see a case for what the OP is saying, but I feel it should always be seen as a temporary measure.


It’s been the exact opposite for me. The spaghetti code has always come from poorly conceived abstractions and the massive problem of inverting an API to reimplement functionality through the API that should be extensible within the API (but fails to be because of poor choices in abstraction or abstracting prematurely).

Later on that spaghetti code gets labeled as lacking abstraction, similar to what you are saying, despite the actual problem being too much abstraction and poorly designed abstraction that became load bearing in a way where everyone decides that living with API inversion as a reality is the lesser evil and figures they’ll probably quit the company and move on to greener pastures before it becomes their headache to deal with.

https://en.m.wikipedia.org/wiki/Abstraction_inversion


Absolutely this. I’d rather look at 200 lines of linear, inline documented code then a spaghetti mess of “helper” functions that do nothing better than obfuscate everything going on.

I’ve had a strict rule with my team of “1, 2, N”. I don’t want to see an abstraction until we’ve solved a problem similarly at least two times, and even then an abstraction may still be a poor idea.

Abstraction is an especially poor idea early in a project because often you only half know what you’re making (I’m in games). Requirements change, or a special case needs to be added, and all of a sudden you are trying to jam new behavior into “generic” helpers without breaking the house of cards built around them.


I agree that over-engineered helper function hell can be a real problem.

I disagree strongly with strictly enforcing the 3x rule. The right abstraction can be helpful even if it is used only once. The right abstraction will communicate its purpose clearly and make it easier to reason about the program, not harder. Obfuscating implementation details is a feature not a bug, as long as the boundaries of the abstraction are obvious. Another benefit is it makes it easier to test the logical units of your codebase.

"It’s nice to pretend that a four word function name can entirely communicate the state transformation it does, but in reality you need to know what it does." Are you suggesting you are cognizant of every line of code of every library you use in your work?


Actually yes, you should know to at least depth=1 what your magic incantations are doing when you call them.

And that’s part of my point, if you go that one level of depth and find an excessive amount of DRY, you’ll find it that much harder to know what the hell is going on.


Yes, you should understand what a function does when you call it. Not everyone who looks at a codebase is modifying the codebase or adding new function calls. The person referencing the code may already be 1-level deep in parsing the implementation.

Not all abstractions will seem like a magic incantations when you use them. Something like "convertToCamelCase" conveys its purpose clearly enough that the reader can assume what the low-level operations are. They don't need to look at these operations every time they need to reference the code.


200 lines of code means that you have to comprehend all 200 lines simultaneously since any line could potentially interact with any other line in that code block. Using functions where the state is passed as parameters limits the potential for code interactions through functional boundaries. The point of abstractions are to limit complexity by limiting potential interactions. Helper methods do a fine job of this.


That’s a gross over-generalization to assume that 200 lines is always a self-referential mess. Functions fundamentally transform data, and often that transformation is a linear process. If it’s not, sure, break it up in a more sensible manner.

Regardless, helper methods have a significant cognitive cost as well. It’s nice to pretend that a four word function name can entirely communicate the state transformation it does, but in reality you need to know what it does and mentally substitute that when reading the function using it. No free lunch.


I worked on a webapp that our team inherited which had 400-800 line controllers (and one that was a little over 1200 lines). When I first started looking at the code I was horrified but then I realized that everything was self contained and due to the linear flow, pretty easy to understand. You just had to get used to scrolling a lot!

The issue that we started having is that pull requests, code reviews, and anything that involved looking at diffs was a lot of work. There were two main issues:

1) Inadvertently creating a giant diff with a minor change that affected indenting, such as adding or removing an `if' statement.

2) Creating diffs that had insufficient context to understand: if your function is large enough, changes can be separated with enough other lines of code to make the diff not be standalone. You end up having to read a lot of unchanged code to understand the significance of the change (it would be an ideal way for a malicious developer to sneak in security problems).


>That’s a gross over-generalization to assume that 200 lines is always a self-referential mess.

The point is that you don't know this until you look. You have to look at all 200 lines to understand the function of even one line. When you leverage functional boundaries you generally can ignore the complexity behind the abstraction.


You're fooling yourself, in a mature codebase, if you think you can modify code and not look past function boundaries.

That assertion would be more credible in a language that captures side effects in the type system, but that's not what most people use.


I'm not sure what point you're making. If you are just assuming that functional boundaries tend to not be maintained in practice then you're not contradicting anything I have said. Whether or not functional boundaries are easy/hard to observe depends on the language and coding conventions.


I have had exactly and overwhelmingly the opposite experience. I wonder if it's a function of our fields, or what...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: