Rendered at 11:29:47 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
lg5689 18 hours ago [-]
I believe that "single source of truth" is a principle that should always be followed. If there's duplicated code where it'd be a bug if they diverge, then you should refactor. It creates a long-distance coupling in your code that may be invisible to future developers until a bug emerges.
But with that in mind, I mostly agree with the article: if it's not a violation of "single source of truth", then abstractions are just a convenience. If it starts being inconvenient, then it's not doing its job and there's no reason to use it. It's a serious code smell if a function needs several flags for custom behavior; that means it's probably the wrong abstraction or violating the single responsibility principle. If there is a legit need for lots of customization, an often-good way to handle is to take a function/functor as an argument for the customization. E.g., rather than `solve(f:double -> double, max_iters = 99, x_abs_tol = 1e-15, x_rel_tol = 1e-15, ...)` you can do `solve(f:double -> double, stopping_criteria: StoppingCriteriaClass)`
jonahx 17 hours ago [-]
> I believe that "single source of truth" is a principle that should always be followed
Fundamentally, the article addresses cases where it's not clear yet how many sources of truth there will be. Are the two spots in the code using the same algorithm, or slightly different versions? More importantly, will they change for the same sorts of reasons?
The title adage (correctly, imo) argues that making two different things the same will cause you more pain than making two same things different via duplication. In the latter thing case, the "damage" is just having to make the same changes twice, or doing a refactor to introduce the abstraction. In the former case, you have to keep adding to your abstraction, or undo it. Most crucially, it breaks "locality", which is the only property you really care about when making changes. I just want to make this change and not worry about side effects to unrelated parts of the system.
stanmancan 16 hours ago [-]
The issue with not having a single source of truth is not the fact that you have to update code in 2-3 places, it’s that you have to know to update code in 2-3 places.
Accidental divergence is the problem, not intentional.
jonahx 15 hours ago [-]
Yes, this is true. And is a bigger problem on large teams. One mitigation is a comment by the original author at both sites that there may be a coupling in the future.
But, again, the point is that you don't know yet whether you have a single source of truth or not. It's a question of the relative badness of duplication vs premature abstraction in cases where the code may diverge or converge in the future. There is no generic answer. But as a heuristic, based on my personal experience, I have always found premature abstractions to be more painful to work with. Even more so when someone else has authored them.
Maxion 5 hours ago [-]
A lot of the time in my experience this comes down to coders thinking the logic is the same and abstracting something to a central source, when from a business perspective the rules are similar but actually different.
So many times I've had to untangle these types of abstractions when business asks for changes to case X but not Case Y. OR worse, business asks for changes to case X, but it also affects Case Y due to abstractions. Business see X/Y as different things so did not even think to mention that the new suggested behavior is to only affect case X, but to coders they're the same.
ytoawwhra92 11 hours ago [-]
For pure logic I find refactoring to enable divergence much easier than implementing convergence.
usrusr 6 hours ago [-]
Not only easier finding call sites than finding copies, also more intuitive to start looking. "Which callers will be affected by the change?" is the most natural question to ask. "Which places should have this same change applied?", not so much.
pjio 2 hours ago [-]
You forgot the fourth place.
dspillett 13 hours ago [-]
This sometimes falls under “be cautious with what you output, but generous (i.e. flexible) or very careful (full validation, good logging, making sure you fail safe upon receiving any/all unexpected input) with what you accept”. This usually makes duplication the worst choice because you could have to do a lot more thinking (and maybe coding) down the line to make sure all is well everywhere, and you need to document (or at least comment) so that others know these requirements when they make future changes, but it can be a valid approach especially in related but loosely coupled parts.
ketozhang 14 hours ago [-]
This assumes the bug exists in both places which might not be true at all even if they both are dependent on the same duplicated code.
If you only spot the bug in path A and not path B, why fix the bug for B?
dspillett 13 hours ago [-]
You still need to know to assess B to make sure that it is not affected, and verify that it is not adversely affected if it interacts with the output of A after you have changed it.
fragmede 14 hours ago [-]
Why bother having to reason out if path B is or is not buggy? Instead of potentially getting that analysis wrong, DRY, fix it in the one place, be sure that it's fixed for that case, and move onto the next bug.
bluGill 13 hours ago [-]
No, the issue is when there are not two or three places, it's when there's hundreds or even thousands of different places. Two or three is annoying, but not a big deal. However, as you get into the hundreds and thousands, it becomes a real problem. In real world code, this is an all too common case.
dilyevsky 12 hours ago [-]
Seems like this is a problem almost entirely solved by llm+vector database setup.
throwaway2037 11 hours ago [-]
I don't follow. Will this help to identify duplicate code? FYI: JetBrains' InelliJ already has this feature built-in for years now.
pydry 13 hours ago [-]
Sometimes it is genuinely easier to duplicate when that happens - e.g. if three teams maintain an enum with 4 values and there is no existing mechanism for sharing code between the projects.
QuadmasterXLII 15 hours ago [-]
One killer life hack I’ve found is, if extreme duress pushes software into two sources of truth, add a ci test that wont merge into main till the sources match. The canonical case of this actually being the best solution is pyproject.toml / requirements.txt synchronization, but I suspect it has broader applicability. A precondition is that things have already gone off the rails far enough that single source of truth is unattainable, this is more harm reduction than cure
uberex 3 hours ago [-]
I know it is just an example but I'd generate one of those files from the other in that case.
namelosw 4 hours ago [-]
> I believe that "single source of truth" is a principle that should always be followed
Theoretically and conceptually I agree. But in practice there are a lot of programming languages aren’t as expressive. People prefer codebases with duplications rather than visitor patterns everywhere. In essence, visitor pattern is a tool to solve multi-dimensional abstraction problems, just like type classes in Haskell or CLOS in Common Lisp. But it’s so verbose and non-straightforward so more often than not it’s not worth it even conceptually it’s a legit case for “single source of truth”.
jamiejquinn 11 hours ago [-]
> it'd be a bug if they diverge
That's a very nice rule of thumb. I've often overabstracted when two pieces of code look similar at one point in time and then they diverge.
mdavid626 16 hours ago [-]
Of course, in theory this is true. In practice people tend to avoid ANY duplication no matter what. Especially junior developers, as if duplication would be the root of all evil.
jihadjihad 16 hours ago [-]
> as if duplication would be the root of all evil
And instead it gets replaced with the actual root of all evil, complexity.
Akronymus 16 hours ago [-]
To be more specific, incidental complexity.
Many problems have tons of inherent complexity already.
ttoinou 15 hours ago [-]
We still need a way to track that there’s some common pattern in the code. So that when we update one pattern we wonder about the others places in code with the same pattern. Avoiding duplication doesn’t solve that
Akronymus 15 hours ago [-]
My metric for that is "does that code MEAN the same thing" or "does it just look the same". Has worked quite well for me so far. I frequently find myself making a copy of some code rather than adding a parameter (most commonly done with code that would get some flag added)
ttoinou 15 hours ago [-]
Me too ! I don't follow DRY that much, I'm aware that copy pasting is good enough for a few weeks / months to see how things evolve in the future, and do refactor when it's really needed. That said, how do you know if they mean different things ? For GUI code for example, they do mean the same thing, but there's a good chance the code will evolve in the future so premature refactor are wasted time
pocksuppet 15 hours ago [-]
GUI code changes as fast as your GUI does. If you have two buttons, call makeButton twice. If they have totally different sizes, don't calculate the size inside makeButton. If tomorrow you want a button and a checkbox, don't call makeButton twice with isCheckbox=true the second time.
Fun fact: Win32 checkboxes are buttons with a bitflag that says they are actually checkboxes.
Akronymus 15 hours ago [-]
Mostly by looking at the calling site where the code is already used and the calling site where I want to reuse it. If both of those mean the same (calculate the tax on x products, for the purpose of applying to the shopping cart, vs for applying to generating reports) then I'll reuse it, if it can be achieved without adding stuff like flags, in most cases. In other cases, it just looks the same (sum some field + calculate a percentage of that, for example, for discounts vs taxes on products) where it's obvious that they don't mean the same. (Though, I do heavily rely on a good type system to deal with future evolutions of that copied code)
TL;DR: Vibes
ttoinou 14 hours ago [-]
Its always about how far ahead in the future you plan ahead. And sometimes this future thinking is wasted time
Brian_K_White 15 hours ago [-]
This right here.
Here we're loading the customer record and updating their discount %
Here we're loading the broker record and updating their commision %
They will have 99% identical code.
It's possible but exceedingly unlikely we have found 2 things that should be a load_record_and_update_percent(file,id,field,val)
Tomorrow the business logic behind one of those will no longer be a simple % and now you have a real mess.
t-3 15 hours ago [-]
> when we update one pattern we wonder about the others places in code with the same pattern. Avoiding duplication doesn’t solve that
It can, that's all about how aggressively you factor and structure your code, eg. combinators make it easy to reuse code in different application patterns without rewriting.
ttoinou 14 hours ago [-]
In which language do you use combinators for that ?
Even in that case the refactor can introduce mental overhead when having too many different variable / properties names
t-3 14 hours ago [-]
Any language that I can write a combinator in. It's quite easy in C, for example.
mdavid626 16 hours ago [-]
Exactly!
kimtan21 15 hours ago [-]
[dead]
throwaway2037 11 hours ago [-]
The hardest part is two algorithms or business logic routines that are nearly identical. What to do? Frequently, all solutions look equally bad!
the_af 16 hours ago [-]
This is something I've seen repeated time and time again as a criticism of (misused) abstraction and DRY, yet I've never seen ONCE -- and this is not hyperbole, I mean it literally -- a junior making an abstraction with any thought to reuse, generalizing anything, or caring about not repeating code. Most juniors I've worked with are content to just churn new code without paying attention to the codebase at all. This all before the AI deluge, mind you.
Very similar with patterns. I've often read people protesting that juniors overuse design patterns, yet I've seldom seen a junior (mis)use anything more complex than a singleton, and when they use any pattern, it's usually forced upon them by an opinionated Java framework.
dasil003 16 hours ago [-]
This smells more like the fluidity of what people mean by “junior” more than anything else. Journeymen engineers in their over-engineering phase, or even very “senior” expert programmers can suffer over fitting the product to their own mental model. The most senior judgment is to understand when an abstraction makes sense at a customer level, because that defines the durability of a business-logic abstraction.
the_af 16 hours ago [-]
I do agree this happens with the senior overengineering phase, but the comment I replied to mentioned "especially juniors" and I've heard this trope specifically about juniors, with the implication they want to apply what they learned in college, but this hasn't been my experience at all.
lmm 11 hours ago [-]
> Very similar with patterns. I've often read people protesting that juniors overuse design patterns, yet I've seldom seen a junior (mis)use anything more complex than a singleton, and when they use any pattern, it's usually forced upon them by an opinionated Java framework.
I've seen it occasionally. There was one junior whose code I saw littered with DTO that're an exact copy of the business object and DAOs where every method is just a wrapper for a Hibernate method. But yeah it's rare.
robotresearcher 16 hours ago [-]
In the early 2000s I often saw juniors and students make staggeringly deep class hierarchies. The equivalent of:
"Intro to OOP" lectures/articles made a deep impression on some people in not quite the right way :)
throwaway2037 11 hours ago [-]
I was probably that guy! It was all the rage 20 years ago, including worrying about the diamond inheritance problem. What is the equivalent in the current generation? ORM that no one can maintain? Unnecessary dev ops complexity? Anything "web scale"?
the_af 10 hours ago [-]
Are ORMs still a thing? I've been away from OOP for some years now, but just when I was leaving it, there was a trend firmly against ORMs... my guess was that they were on their way out, replaced by more lightweight libs and frameworks? Or did they make a comeback?
Regarding OOP itself, I also remember when "favor composition over inheritance" became a thing. Was this reversed too?
digitaltrees 6 hours ago [-]
I love an ORM. I think much of the problems people experience with ORM, OOP, Restful routes, is because they get the domain model wrong. When you model the data correctly you don’t need to have complex queries that push ORM beyond their breaking point.
dbalatero 9 hours ago [-]
> Regarding OOP itself, I also remember when "favor composition vs inheritance" become a thing. Was this reversed too?
I think this is generally still the advice, when working in OOP contexts.
the_af 13 hours ago [-]
I was working at that time and never saw this from juniors. Overeager seniors and architecture astronauts, sure. But juniors? They mostly copy pasted code without even taking a second look at the codebase, and without bothering to break functions in any sensible way.
Mind you, I mean enterprise and line of business software, not hobbyists. I also mean of their own volition, not the kind of nonsense that Java frameworks often forced on them (all the patterns under the rainbow, factory abstract method factory of abstract methods).
throwaway2037 11 hours ago [-]
Were you the same when you were a junior? I was. I didn't have the experience to understand the impact of my changes. The norm reply on HN: "You need more mentoring or code review.". Sometimes (usually?) that is in short supply.
the_af 10 hours ago [-]
Absolutely. I made all the usual mistakes, and had to be mentored and learn from more experienced programmers.
(Alas! Sometimes you pick up bad habits from experienced people, and being a junior, you don't know better)
wellpast 16 hours ago [-]
Definitely the hallmark of junior. Obsession with code deduplication as the highest pri when it’s quite low among others.
ozim 13 hours ago [-]
Well I have seen a lot of „expert beginners” who have years of experience on paper but fight tiny duplications like their life depends on it.
„How Software Groups Rot: Legacy of the Expert Beginner”.
I have recently fallen into a job at a small company that really seems to have this culture. Thankfully, I'm only going to be here for a year and a half or so (fixed term job for working holiday visa), but I'm trying to be really aware of how its impacting my career development.
There is no automated testing, no meetings, seemingly no code review process, no standardization of schemas for files that are passed between different applications, all jobs are run on on prem desktop workstations.
janpmz 6 hours ago [-]
> If they diverge
This is the key, if they are very similar but used by different consumers the chance that they will diverge in the future is very high. And once they do they will break the abstraction.
fpoling 16 hours ago [-]
With LLMs the cost of duplication is much lower and LLMs
cluckindan 16 hours ago [-]
> and LLMs
… sometimes duplicate things unnecessarily.
at_compile_time 16 hours ago [-]
or stop midsentence
storus 16 hours ago [-]
When you run out of tokens, you run out of tokens!
sscaryterry 16 hours ago [-]
The struggle is real.
robotresearcher 15 hours ago [-]
Would you like me to outline some concrete steps for dealing with the struggle?
lossyalgo 15 hours ago [-]
We would love to, but we ran out of tokens.
alberto467 16 hours ago [-]
Code duplication differs from single source of truth applied to data in the sense that data is data but two pieces of code may functionally be the same (they do the same thing) but they might be semantically different in their usage (they’re advertised to achieve different things), in that case coupling them together with deduplication and forcing them to do the same thing doesn’t really make sense, and may make the codebase more difficult to work on in the future (especially in companies where different teams have responsibilities over different parts).
threethirtytwo 14 hours ago [-]
code and data are the same thing.
but thats too philosiphical to talk about or for you to understand.
Put it this way. You're implying code can be duplicated as long as they are advertised to do different things. But can't that conceptually be applied to data as well? I have the number 5 representing age, and I also have the number 5 duplicated somewhere else representing cost. 5 is duplicated because they are "advertised" to do different things.
Because code and data are philosophically the "same" the properties of "single source of truth" applies to both in the same way.
shhshahja 6 hours ago [-]
If you knew in advance which source of truth is important to isolate you don’t have this problem.
The problem is not knowing which of the hundreds or thousands of potential truth sources is worth abstracting. The only real way of finding out is not abstracting them and seeing how it works out.
If the problems in SWE boiled down to solve(f -> MagicallyNoProblemAnymore) we wouldn’t have this discussion.
infinitebit 15 hours ago [-]
i don’t think anything in the article advocates for not prioritizing “single source of truth”, as in, if we know that there are multiple sources of truth for something, it should absolutely be deduped. the article is more saying “be a bit more skeptical of any two pieces of code actually representing the same thing” and “be more willing to break apart an abstraction that is trying to represent multiple truths.”
jackbucks 16 hours ago [-]
I have always believed what the article more or less states. But you have to remember, the primary and maybe only source of duplication in software is situational dependency (the other word escapes me for this). If there was a universal tree of software functions that could be accessed over a network no function would ever be duplicated and every function would be reused from a central tree. When you put 2+2 inside a method or function body you just duplicated code. or any code inside a method or function body.
This is why we have to have programs that duplicate code by doing anything like adding two numbers together or complex logic that is easy to create bugs when someone wrote it 40 years ago better. Because code reuse is mostly done on a very small scale.
Given thats the case when you start on a new React project as an example you are not reusing application code you are duplicating the react framework so you can duplicate every other web app in every sense except maybe the visual.
There is no such thing as full reuse and until we get to a universal network invocable function tree that can be extended only when its truly unique we never will. Maybe AI will do this. People cannot.
At the end of the day code duplication needs to exist to optimize for local correctness (or incorrectness) and speed and abstractions goal is not to provide pure reuse. Its to provide a place to "put your logic" that may be similar and has access to typical state that some kind of widget might typically need.
I think about this on occasion. Most recently I ran into an issue during a personal project: 2d sprites for RTS units were packed on spritesheets in a consistent manner: 5 sprites for 8 directions (you mirror 3). Packed in order of: stand, move, attack, die. So I made a loader that understands how to take action + direction and offer an array of sprites to play through.
But then I came across more cases: sprites with no directionality (an explosion), and corpse sprites (which were only 4 directions, 2 mirrors, and most except the first four were shared by both orcs and humans).
I agonized for a little bit on what the hell the common abstraction is for all this. In the end, I factored out some of the loading code, and made a UnitLoader, CorpseLoader, EffectLoader and moved on. Now, there's probably a better abstraction in there because all 3 loaders have to reason about the same things a little bit. But I will discover that abstraction later on and it's easier to just de-duplicate the code then, rather than try to identify the abstraction now and make some complicated EverythingLoader that handles all those cases.
andai 15 hours ago [-]
I like this quote, "things should be made as simple as possible, but no simpler."
I think the natural instinct with programming is to try and simplify the code by means of generalization. But we often over-simplify, and reality is messy. Or as TFA mentions, time passes and new requirements arise, so it turns out that we have simplified prematurely!
Sounds like this should be an aphorism. Premature abstraction is the root of much suck!
dahart 14 hours ago [-]
You probably already have the common abstraction factored - the code to load pixels for a single sprite, and to display it? It makes sense to me that the level above that, interpreting the sprite sheet layout and modes of playback, come in different flavors and don’t have a common abstraction that fits all cases.
Personally I prefer what you’re doing over trying to come up with a non-obvious abstraction or trying to make an imperfect abstraction fit. Waiting til the abstraction is totally obvious and the need is crystal clear is a good thing.
The flipside (antidote?) of DRY is WET - write everything twice/thrice. More important, IMO, is to abstract only over things I have an actual, demonstrated use case for, usually demonstrated first via duplication, and not speculate about possible future uses I might want. Code written for future use cases we don’t have is so often the code that gets in the way of abstracting the things we do have, and it cracks me up when that happens.
Waterluvian 14 hours ago [-]
> Waiting til the abstraction is totally obvious and the need is crystal clear is a good thing.
I discovered this after a few early years of my career being a bit of a “best practices” zealot. The thing I say often at work is, “let’s get this shipped to prod so we can start learning all the things we don’t yet know about it.”
galleywest200 15 hours ago [-]
This is the way. Making games is supposed to be fun. You can do the hard boring stuff when you get to the final 10% of the project.
Besides, sometimes your duplication creates "bugs" which may turn out to be fun features that players enjoy.
bhouston 19 hours ago [-]
I used to struggle with abstractions back in my OOP days but since moving pretty much to a purely functional approach I find that code duplication is rare. Just have a function and call it in two parts. The main abstraction issue is then data structures but with TypeScript interfaces being duck typing essentially I run into few problems there as well.
So code duplication because of abstraction issues is rare. Code duplication because of siloed developers is so much more common.
kccqzy 17 hours ago [-]
Developers do not really have to be siloed to experience code duplication. When the team size grows past a certain point such that each person is not aware of what every one else is working on, code duplication is quite inevitable. This is the case even if everyone writes functional style code. In fact this just happened last month at work: I wrote a new functional and pure helper function and placed it at the beginning of the file; a week later a colleague told me a similar helper function with substantially the same functionality with a different signature had been written and placed near the end of the same file.
ikety 18 hours ago [-]
For hobby, I use functional languages, and I find the techniques are the important bits to remember. Most modern languages let you easily stand on functional programming theory. You don't need to know Haskell. Everyone's brain works differently, but the idea of small, simple and occasionally flexible parts building a whole works for me. As opposed to the large complex do it all shape shifting machine.
yCombLinks 10 hours ago [-]
Not seeing how this makes sense in terms of the article? A function is an abstraction. Extracting duplicate code into a function is the same concept.
throwaway2037 11 hours ago [-]
> but since moving pretty much to a purely functional approach
What language?
platz 18 hours ago [-]
what exactly is 'calling a function in two parts'
saghm 18 hours ago [-]
I assume they mean to call the function from two (or more) parts of the code (i.e locations). It's not immediately apparent why this is meaningfully different than what would be possible in Java though, since ostensibly a function is the same as a method by just moving the callee to the list of parameters. (There are some things in a Java method that you can do that don't translate to most functional languages, like invoking the version of the method from a superclass, but there's nothing forcing you to do any of those from the language perspective, so it seems a bit strange to claim that the language itself is the issue rather than maybe the specific patterns that were chosen, maybe by their coworkers or just not common in the ecosystem).
bhouston 17 hours ago [-]
You can do functional in any language. I haven’t designed a new class in years in TypeScript and I’ve been more productive as a result.
lysium 18 hours ago [-]
I read it as „calling it from two places“
Akronymus 16 hours ago [-]
I assume to split the overall behaviour (loop through all elements, transform some value, etc) and the specific one (apply this function to all elements, transform it in this way, etc) into multiple functions and combine those to achieve the actual intended behaviour.
At least that's my interpretation
tosh 16 hours ago [-]
apparently it is not what the author meant but:
using projection you can "call a function in two parts"
add: {x+y}
add4: add[4] / gives {4+x} by fixing 4 as first argument to {x+y}
add4[2] / gives 6
this is a useful pattern that you can use to first 'fix' data or behaviour
to produce another function
Calling a function from two locations is what I meant.
Basically since moving to a functional approach in typescript I find I do not fight abstractions as I used to when I used classes and inheritance.
odo1242 18 hours ago [-]
I believe they’re referring to callbacks / dependency injection / higher order functions to customize the behavior of a function?
bhouston 17 hours ago [-]
Mostly just function calling to reduce duplicate code. Dependency injection does start to get abstraction costs again. I use it when necessary but it is annoying and costly when I do.
strongpigeon 19 hours ago [-]
Echoing the article, anyone who has experienced both will agree: it’s far easier to work with an under engineered code base than an over engineered one.
tarcon 4 hours ago [-]
Contrary to that. The saying - Better to have a bad abstraction than none - was born from spaghetti code pain.
Rendello 18 hours ago [-]
Two talks come to mind here: Mike Acton's Data-Oriented Design and C++ [1] and
Brian Cantrill's The Complexity of Simplicity [2].
Mike's talk argues that code solutions need not be modelled on the real world, and that different data creates different problems, which need different solutions. I can't do the talk justice, but it's had a big impact on me.
Brian's talk is about abstraction generally, and how it's difficult to find the "right" abstraction.
> Mike's talk argues that code solutions need not be modelled on the real world, and that different data creates different problems, which need different solutions.
I've always found it odd when even fairly smart engineers sometimes prioritize real-world metaphors over the actual needs of the codebase. Years ago when I was only a few years out of school, I was implementing a connection pool in Rust, and the most reasonable way to implement it was to have the connection hold a weak reference to the pool so that it could get checked back in automatically when dropped. My manager (an extremely experienced engineer) didn't like this idea because "a library holds library books, not the other way around". I didn't feel like this was a compelling reason to design things differently, but he refused to engage with the issue in any way other than through the lens of that metaphor. Eventually the impasse was solved by one of the other managers in my department suggested that while library books don't contain libraries, they do have the name of the library stamped in the back as a reference to where they should be returned, and I guess my manager found this to be a reasonable extension of the analogy. If I were more experienced, maybe I would have recognized that I could find a way to engage with the analogy like the other manager did without ceding the point, but even today I still feel that it was completely bizarre to insist on that as the canonical way to frame things rather than just considering the ramifications of the abstraction in the code and the experience of using the library based on it.
Rendello 16 hours ago [-]
This is somewhat related:
I mention this a lot, but in researching Data-Oriented Design (what Mike was talking about), I came across Richard Fabian's DoD book [1] which talks a lot about database normalization and the like. I found that odd, because the low-level high-performance game code he was talking about certainly wasn't going to marshal data into a DB to run SQL queries on it.
It turns out the relational model has a lot of advantages though. Programmers use trees all the time, in OO, in structs containing structs, in objects pointing to other objects. It's easy to forget that trees are just a special case of graphs (ie. networks), and that there are many ways to represent networks that don't rely on encoding a tree structure directly.
So, I've been doing what Richard Fabian suggested and I lay out my data (on paper) into tables, then attempt to normalize it and see the connections. I really like this way of designing things.
My big issue is that doing DB-like operations is hellish in most programming languages, and if you really want to try and marshal your data into a real DB (say, SQLite or DuckDB via a library), then you have a big messy translation layer where you're trying to match things to SQL types and you have giant SQL strings everywhere.
I see C# has LINQ, which is a query languages embedded in the language. I wonder if that approach is best, and why hasn't it been adopted more broadly? It seems like there's a lot for programming language designers to explore in this dimension, though I wonder if it even matters now with the superintelligence tidal wave.
> My big issue is that doing DB-like operations is hellish in most programming languages, and if you really want to try and marshal your data into a real DB (say, SQLite or DuckDB via a library), then you have a big messy translation layer where you're trying to match things to SQL types and you have giant SQL strings everywhere.
Have heard of the JOOQ library for Java? It is a godsend because you can write guaranteed type-safe SQL using pure Java -- no syntax sugar. I expect that LINQ can do the same in C#.
skydhash 9 hours ago [-]
> My big issue is that doing DB-like operations is hellish in most programming languages, and if you really want to try and marshal your data into a real DB (say, SQLite or DuckDB via a library), then you have a big messy translation layer where you're trying to match things to SQL types and you have giant SQL strings everywhere.
I prefer having that translation layer especially when it's domain oriented. All the sql strings are collected in one isolated module, and the only exported symbols is a set of functions.
From Domain-Driven Design, what I learned is to be comfortable having different representation of the same data in different layers/subdomains. Something may be a fat object from the API, but I prefer having a collection of functions that each use a different part and have a caching layer to not actually do the expensive network call. That network call and the caching layer will be encapsulated in one module and the collection of functions will be the only thing visible.
saghm 11 hours ago [-]
This is something I've thought about a lot over the years, not in small part because the connection pooling work that I mentioned above was during my first few years out of college where I worked at MongoDB on some of their database client libraries. I know MongoDB gets a lot of criticism on these parts of the internet (which at least in terms of technical opinions is in my opinion a mix of stuff that's warranted, stuff that's a bit more nuanced than internet arguments might make it seem like, and some stuff that's mostly just holdovers from the very early days that hasn't applied to any version of the database people have used in the past decade), but one of the things I always found interesting about it is how it changes the experience from what you describe to one where the bulk of the work is figuring out the best way to model the data (where you have to care about things like "how 'many' is this 'one to many' relation" and "when I access this data, is there any other data I'd almost always expect to need to access at the same time?"), and if you've done that right, the queries themselves end up being a lot more straightforward to come up with (either single operations like "find this" or a pipeline of transformations starting from "find this" and then "do this to the output of the last stage", compared to the "inside out" way you sometimes have to wrap up subqueries in SQL with outer queries).
It's a reasonable take that changing the entire way that the database modeled everything under the hood is an overkill solution to the specific problem you mention compared to something like LINQ that can work on top of existing databases, but I can't help but wonder if there's a bit of inertia in how willing people are to challenge their usual ways of thinking about how data modeling might be possible to improve because a lot of people don't get exposed very much to anything other than the raw, string-like handling that you mention (which is annoying but at least SQL injections are a well-known thing nowadays and tend to be possible to avoid) or a full-blown ORM (which quite often ends up either being wildly inefficient or needing to drop back down into the raw SQL in some places to avoid the performance bottlenecks, which kinda defeats the entire point). A startup I worked at a few years ago actually had what I thought was a pretty clever solution to this problem, with their product generating OpenAPI/GraphQL APIs for a given database by inspecting the schema (with optional parameters to get back EXPLAIN data in the responses to verify that the query was what you wanted, and the ability to define custom routes with raw queries that were checked into shared version control with the schema migrations if you weren't happy with the query it generated as a way to properly separate concerns as an improvement over the traditional ORM workflow), but despite the idea seeming quite enticing to me from a technical standpoint, I guess it didn't show enough traction to be able to survive.
znkr 19 hours ago [-]
+1 The worst code I had to maintain was code that tried to follow DRY (without the trying to understand what the original intention of that principle was). The only way out of that mess was widespread code duplication.
tomjakubowski 15 hours ago [-]
It'll be fine, don't worry about it: just add a couple more obscure boolean parameters to that reusable function to support your new use case and ship it.
zadikian 16 hours ago [-]
Yep. Keyword "tried," as in they did it for a while then hit a point where it's impossible to faithfully follow the abstractions because they're wrong.
aftbit 16 hours ago [-]
Similarly, I've seen some developers who seem to think that any inline string or numeric constant is evil. In one PR, I saw:
I don't understand what they think this is buying, other than just cargo culting "don't embed constants." And of course, the constant definitions were at the top of the file and the url building code was hundreds of lines away.
regularfry 38 minutes ago [-]
That particular example doesn't quite fit, but I've certainly seen cases where otherwise perfectly ordinary fixed strings needed to be broken up to meet linting rules.
preg_match 16 hours ago [-]
I’m a big fan of closeness in code. I prefer defining things as closely to where it’s used as possible. This is a big pet peeve for me!
Do not put regex at the top of the file either! Put it where you use it. Languages are smart, they’ll probably be able to tell that it’s constant anyway.
Also for tiny functions just use a lambda. Please don’t make a one line function a million miles away that you use once or twice.
robotresearcher 15 hours ago [-]
Amen! The existence of 'helpers.js', 'utils.cc', makes me twitch.
15 hours ago [-]
what 7 hours ago [-]
If multiple things use the same regex, which one should it be close to? Or do you propose duplicating it?
dahart 14 hours ago [-]
Having the constants at the top is more easily customizable, especially should this file get duplicated. If devs need to switch to http instead of https for testing or staging, it makes sense to separate the scheme from the domain and put the constants up top or even in another file. It also matters whether ‘url’ was constructed in multiple places or a single place. Having named constants at the top of the file is a very common style, and sometimes is part of the group coding standards.
Anyway, maybe there are other reasons too, so see Chesterton’s Fence. In any case, it’s never a good idea to assume cargo culting. Someone could easily say the same thing about using inline literals. If it looks weird, ask around and maybe you’ll find out there are good reasons, or maybe you’ll find out nobody cared and that people will like it if you refactor and embed the constants.
mrkeen 15 hours ago [-]
I ran into this as well. If an Event has a name, you can instantly grep across a giant monolith (or a big folder of microservice repos) and find every file that is concerned with that event.
If you pull it out into a constant, you're back to opening up projects one-by-one to 'find usages'
jwpapi 3 hours ago [-]
Whilst I understand cases in which duplication is preffered. I generally think abstractions are underused. Sometimes I would abstract something away that is only done once, not because i want to have less code, but because it allows me to solve bigger problems and when i look at a function I don’t have to worry about it. It allows to create systems. Obviously your abstractions should be good
cryo32 19 hours ago [-]
You can do both with microservices!
mystifyingpoi 16 hours ago [-]
I get the joke, but in ideal world, in microservices, there is no such thing as code duplication across services. As a maintainer of a service, I should not give a crap about code present in some other service - it's some other team's code, why would I care? I don't have to even know that the other team exists. In big systems, it happens that I can't even feasibly know the existence of all the applications.
throwaway2037 11 hours ago [-]
In [an] ideal world, in monoliths, there is no such thing as code duplication across subsystems.
cryo32 16 hours ago [-]
You mean you don’t have 300 versions of your badly developed rapidly evolving platform services rotting away underneath the bit you didn’t duplicate?
mystifyingpoi 16 hours ago [-]
I don't. Other teams - maybe they do, maybe they don't. Who cares, not me. I have responsibility for services of my team, I think we are doing a good job.
Being selfish is the core principle of microservice architecture.
cryo32 16 hours ago [-]
Until half your company gets laid off and you have to adopt other people’s shit.
zephen 19 hours ago [-]
But wait! There's more!
For $19.95, you can replace your single single point of failure with multiple single points of failure!
flawn 19 hours ago [-]
Or for 100$, get a 5x increase on all failure points - maximum vibes, maximum excitement.
mohamedkoubaa 18 hours ago [-]
Please, stop it
DJBunnies 19 hours ago [-]
Except 9/10 times microservices end up wildly dependent on each other, yielding a distributed monolith. Better to use service oriented architecture and just ship the monolith, you can test easier and skip the extra layers of serialization / deserialization.
mystifyingpoi 16 hours ago [-]
> end up
So it just happens, right? There is no remedy to this? You know the answer :)
BTW I'm all for monolith.
loevborg 19 hours ago [-]
I think you missed GP's point
zephen 17 hours ago [-]
Poe's Law FTW!
irishloop 18 hours ago [-]
Too many abstractions are bad. Too many code duplication is bad.
Part of being a good engineer is finding the right balance.
I know engineers who would gladly duplicate code all over the code base to avoid creating a new abstraction.
I know engineers who create polymorphic abstractions for a single caller with a very obvious set of parameters.
So much of wisdom is in finding balance and not being dogmatic about rules.
lokar 18 hours ago [-]
I feel like the balance has shifted over the last 30 years, and is speeding up. Semi-automatic and fully automatic re-factoring has made dealing with duplicated code much faster, cheaper and safer. Changing abstraction is still high risk.
crazygringo 17 hours ago [-]
Isn't it the opposite?
Automated re-factoring means you can refactor duplicated code only as long as it is exactly duplicate.
Whereas the whole problem is that when somebody changes 3 out of 10 of the duplicate cases in a simple way that they are no longer exactly duplicate, and then somebody fixes a bug in one of the other 7/10 cases, they can update the bug across the 7 "duplicate" cases but they'll miss the 3 that aren't.
The problem with duplicate code is always when some of the instances get changed/fixed but not all of them. And that when somebody edits one instance, they often aren't even aware of all the other instances.
Abstractions are low-risk, because you know where the code is. If it's the wrong abstraction, you can fix that and know what you're fixing. Whereas with duplicated-yet-modified code, you've now lost the connections between them.
throwaway2037 11 hours ago [-]
> Automated re-factoring means you can refactor duplicated code only as long as it is exactly duplicate.
This has not been true in JetBrains' IntelliJ for more than 10 years. It can parameterize refactoring multiple blocks of code.
chuckadams 18 hours ago [-]
I have regularly watched agents forget to update one duplicated pattern after changing it somewhere else. If it's within a single file or related class, it'll catch it, but if it's off in some other package in the monorepo, it's a crapshoot.
heisenbit 18 hours ago [-]
Changing abstraction is a high risk unlike agents refactoring scores of almost identical code.
lokar 18 hours ago [-]
I thought this discussion was limited to situations where you care about code quality
andix 17 hours ago [-]
Duplication is often less harmful than abstraction.
Duplications can often be cleaned up over time, bad abstractions can quickly become a bottleneck, that severely slow down everyone working on the project.
baublet 17 hours ago [-]
The most difficult codebases are those with every little thing some bespoke abstraction that went through 3 rounds of committee reviews that results in having to click through 12 files to figure out what anything is doing. Factory factory factories each with their own little frankenframework to understand before using anything.
ultim8k 18 hours ago [-]
Nobody wants to listen. Nobody. In 90% of the companies there are some so called senior devs that get ecstatic when they create a new abstraction.
Overengineering, abstractions and premature optimisation are the 3 worst plagues of engineering.
At the same time I’m happy they exist because it means we’ll always have a job.
rzmmm 3 hours ago [-]
There are codebases out there with enormous amounts of duplication, filled with implicit dependencies. You just haven't encountered them to appreciate good abstraction.
throwaway2037 11 hours ago [-]
The part that no one want to say out loud: Making boring technology decisions makes you job boring and does not help to build your resume. This is the core reason why over-engineering exists.
skydhash 9 hours ago [-]
Making the job boring is a great way to get free time to browse HN.
globular-toast 6 hours ago [-]
Remember, everyone else's job is simple and pointless, only your job is difficult and important. Therefore only your job could possibly need abstractions. Everyone else is just over engineering.
codemog 11 hours ago [-]
Yep Kubernetes, more micro services than engineers, some complicated protocol that saves a few bytes of overhead, cloud everything, and tons of classes that could have been simple functions.
dang 17 hours ago [-]
I dislike duplicate code as much as anyone, but agree with the OP that bad abstractions can be worse. They add confusion and complexity which compounds over time, since people are forced to build on top of them in ways that (by definition) don't suit the underlying domain and ultimately become self-referential. This leads to contortions, workarounds and even more bad abstractions which ought not to be there—they're reactions to the code not fitting the problem, or as Fred Brooks called it, accidental complexity. You end up in an evolutionary dead end where the system is hard to extend because it's too hard to understand.
I've learned to tolerate a small amount of duplicate code for this reason. If the duplication remains small, it's not that harmful, and if it starts to grow, one has a better shot at finding a good abstraction for it. Bad abstraction is premature abstraction.
One thing I'm not sure this thread has mentioned yet is how LLMs alter the cost-benefit curve of this. They are much better at managing duplication than humans are, and much better at noticing inconsistencies - the sort of small bugs which duplication traditionally leads to. I don't know if this is enough to count as a different kind of good abstraction; I doubt it. It reminds me of a petroleum economist I once knew who had 200 duplicate spreadsheets analyzing different projects and who hired a junior analyst to keep them all consistent. An LLM would be like the junior analyst.
agentifysh 19 hours ago [-]
i recall very early in my career i did exactly this. i took what worked duplicated it—my reasoning being that it was far safer to reuse what has been battle tested and leave refactoring at a later stage
it wasn't received well and senior developer told me that 'good developers know exactly what patterns to use all the time before writing any piece of code and that he will clean up my mess'
long story short his refactoring caused what was otherwise a stable system into a complete mess and it reminded me of Nassim Taleb's book
nicoburns 19 hours ago [-]
It's definitely an "it depends" thing. It's easy to overabstract. On the other hand, I've also met junior developers who just didn't know how to use function parameters.
dimgl 16 hours ago [-]
> long story short his refactoring caused what was otherwise a stable system into a complete mess
Yeah that totally happened
rf15 15 hours ago [-]
"use the right pattern" coming from a senior smells like a senior who can't freely design new patterns. Established wisdoms are a starting point, not the go-to solution.
dofm 19 hours ago [-]
No it's not. This has always been a needlessly iconoclastic rather than sensible suggestion.
At the very least it is not once you're working at the wrong kind of scale.
Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace.
And in the LLM era the wrong kind of scale appears in different ways; code generated and duplicated without proper abstraction and then maintained by an LLM that cannot be trusted to do the same modification each time it encounters a pattern or to have enough of an overview to slowly rescue duplicated code through good abstractions.
I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are past a de minimis threshold.
ubertaco 17 hours ago [-]
I'd recommend clicking through the headline to watch the talk. Metz talks a lot about types of similarity: similarity by coincidence vs similarity due to an actual semantic or functional equivalence.
Code that is coincidentally similar very often diverges in either the short or long term, and DRYing it up aggressively tends to result in functions that have many boolean parameters that each trigger disjoint sets of behavior - which is a bit of a nightmare to maintain due to the high cognitive overhead of remembering how all the interleaved-but-actually-unrelated behaviors should work.
This outcome is low-cohesion code.
It's a useful concept to be aware of - worth clicking through to the actual content of the talk rather than just the headline.
dofm 16 hours ago [-]
> I'd recommend clicking through the headline to watch the talk. Metz talks a lot about types of similarity: similarity by coincidence vs similarity due to an actual semantic or functional equivalence.
I've seen this article and AFAIR the video before, and FWIW having been a Rails developer from the very early days and fitfully until maybe even 2014, I now interpret the phrase "my Railsconf talk…" quite negatively.
ETA: nice to be back to disagreeing with people on HN about coding principles again though. Hopefully this is a sign.
coldtea 18 hours ago [-]
Hardly iconoclastic, it's a very sensible suggestion.
It would be iconoclastic if the common sense basic approach would be to start with abstraction. It's not, the common sense default is to write possibly duplicate behavior until you actually discover several cases to abstract away, until you bevalop a sensible idea of which functionality unites them and which doesn't carry over all of them.
>Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace
Maintaining the wrong abstraction, or, god help, abstractions, would be even worse.
dofm 18 hours ago [-]
> Maintaining the wrong abstraction, or, god help, abstractions, would be even worse.
Hard disagree. When you've had to chase through a change in untold and actually unknown numbers of duplications of code in different permutations and fix them because they are all on fire simultaneously, you'd disagree too. A bad abstraction would at least have had one fire in one place.
trimbo 17 hours ago [-]
> Hard disagree. When you've had to chase through a change in untold and actually unknown numbers of duplications of code in different permutations and fix them because they are all on fire simultaneously, you'd disagree too.
The other end of this spectrum is dealing with the architecture astronaut's up-front abstraction. Totally overengineered for solving the initial requirements, but then constantly needing new hacks to make it cope with new requirements as they come up in the normal course of work.
That's why there's a balance in there, it's somewhere between "always duplicate code even when you know a lot about the problem" and "always write abstractions even when you know very little about the problem."
davidee 18 hours ago [-]
Good faith question: would it?
Wouldn't most large codebases with poor abstractions just have engineers engineer around them with their own solutions? In a large enough codebase you'd have both the bad abstractions and all the not-quite-duplicate implementations ignoring the bad abstraction?
I'm using bad here loosely, it could be buggy, incorrect, incomplete, insufficient and more; while being owned by someone or some team that's a challenge to work with for various reasons (overloaded, under-resourced, overbearing, etc., etc.).
dofm 18 hours ago [-]
> Wouldn't most large codebases with poor abstractions just have engineers engineer around them with their own solutions?
Obviously, yes. But it is my experience that this happens more slowly and that API invocations that break when the abstraction is changed are much easier to identify than broader duplicated patterns of code that span many lines and subtly diverge.
And even then those divergences are better because each wrapper around the abstraction is documenting the problem with it. But the abstraction can generally be replaced by one with the same API surface.
(Even if you take into account the fact that any API behaviour ultimately gets relied upon even if undocumented. Which is true.)
To be fair my experience is that of a freelancer and contractor who arrives trying to fix things that have been through many such hands. And I think if these developers had it drummed into their head that any attempt at abstraction would be better than copy and paste, these situations would be more knowable.
jcgrillo 17 hours ago [-]
> engineers engineer around them with their own solutions
When that happens there's a major engineering leadership failure currently in progress, even if engineering leadership isn't aware of it.
sodapopcan 17 hours ago [-]
Yep, this is why I why I find talking about this tiring. No matter what you say, many people are going to keep reading it as "duplication is always better than abstracting."
jcgrillo 16 hours ago [-]
It's more nuanced than either extreme. But regardless of the root cause, if you have engineers duplicating work left and right something has gone wrong. Their labor is not being used efficiently.
EDIT: LLM or not, this is still true. If you have LLMs pumping out tons of duplicate code you're wasting tokens, and probably more importantly wasting engineer hours reviewing duplicate code.
In some cases it might be a fair trade, in moderation. In general it's certainly wrong.
sodapopcan 14 hours ago [-]
Oops, I think I actually replied to the wrong comment, lol.
swader999 17 hours ago [-]
The article isn't saying don't dry, it's saying don't force dry. Very big difference and you get ideal maintainability when you ease off a bit but still use it.
ordu 7 hours ago [-]
> When you've had to chase through a change in untold and actually unknown numbers of duplications of code in different permutations and fix them because they are all on fire simultaneously, you'd disagree too
Doesn't it mean that you are in a good place to start DRYing code? I mean, code was written in a way to avoid bad abstractions. You can't generalize on 1-2 samples, but now you have "unknown numbers" (more than two?), so you can start looking at it an see patterns. It means you can create a perfect abstraction. It is the basis of the WET (Write Everything Twice) principle.
It would be frustrating, and I mean really frustrating. People are easily generalize over two things but they struggle to generalize over three. Pick two random words and think of a common category they fall into. It is an easy task for 5 years old. Pick three random words and try to generalize them, you would have a very hard cognitive task.
This frustration stems from the inherent complexity of the task. It is not because people before you wrote duplicating code, it is because it is hard to generalize. People before you didn't do it being afraid of missing things and creating a bad abstraction, but you have hard data, you can create an abstraction without missing a thing.
SkiFire13 17 hours ago [-]
> A bad abstraction would at least have had one fire in one place
That's true only for "good" abstractions. Bad abstractions will often require you to change code in all the places using it, requiring you to understand how all of them work and what are their requirements, _all at the same time_.
grayclhn 17 hours ago [-]
IME a bad abstraction results in the same thing, just with a lot of wasted effort coming up with the abstraction first, and a lot more resistance to fixing it because people are too emotionally invested. I’d rather have something clearly chosen for expedience and that no one likes.
ted_dunning 16 hours ago [-]
A bad abstraction would have caused many updates in many places because the API would never quite stabilize due to having been a force-fit from the start.
A uses the abstraction, but finds the API doesn't work. Fixes that.
That causes B to have to make a tracking change which induces a bug. B realizes that the API isn't quite right. Fixes it.
That causes A and C to make tracking changes. These induce more bugs. C fixes the abstraction to avoid these cases.
This breaks A and B so they decline to update.
And so on. This is what a bad abstraction looks like. API "fixes" bouncing around the code as they reflect off of the bad abstraction.
ted_dunning 16 hours ago [-]
I, on the other hand, have had to burn through countless cycles of security alerts because I used a library for JSON parsing that had all kinds of other features that I didn't need or want.
The security bugs were all in features I never wanted.
A bit of simple duplication would have been golden.
anygivnthursday 17 hours ago [-]
Both are bad, what you describe is very real, but so is the opposite. That one fire in one place can end up in a total rewrite of numerous layers because the abstraction never anticipated certain things to happen.
coldtea 18 hours ago [-]
>A bad abstraction would at least have had one fire in one place.
On the contrary: that's precisely what a bad abstraction would not offer.
Instead it would spread its assumptions to different parts of the system, as every caller, sub-service, etc. would have to change shape to fit in that abstraction's box, however unnatural it is (and we know it would be unnatural, because we already said it's a bad abstraction).
Abstraction is not the same as encapsulation.
dofm 17 hours ago [-]
> instead it would spread its assumptions to different parts of the system,
But so does duplication, in practice, and it diverges more as it does.
coldtea 17 hours ago [-]
Duplication is just code doing the same thing in several places, and as such it's much easier to make DRY (and much easier after you have N copies to see what should be shared and what should not), compared to re-architecting the whole system to remove a bad abstraction.
cjfd 17 hours ago [-]
No. The duplication is seldomly that clean. It has started to diverge in subtle ways where the question becomes whether that was the intention or not. In the worst possible cases it has resulted in 8000-line functions full of duplication. 're-architecting the whole system to remove a bad abstraction' sounds fear mongering. That never happens.
svieira 17 hours ago [-]
Ah contraire, mon ami, I am currently in the process of doing just that in many places in my current codebase.
coldtea 13 hours ago [-]
>'re-architecting the whole system to remove a bad abstraction' sounds fear mongering. That never happens.
Oh, it happens all the time.
rpdillon 18 hours ago [-]
In your mind, what's the cost of the wrong abstraction?
dofm 18 hours ago [-]
The major risk/cost is breakage if you must change it but cannot maintain its whole surface even with a shim, right?
But any abstraction ends up with a signature and a name that can quickly be found in code.
The risk of a long-lived duplication losing its shape and being hard to find is much greater. Especially if the code is going through multiple hands.
I once had to pick up a project — a working, fully functional website. I could see, pretty clearly, the work of several people. All but one of them terrible.
The one was a diligent developer who was fully wrong in their abstraction (in fact significantly) but was consistent in how they used it.
The rest had simply worked around that code, copied and re-copied their own modified duplications and let things lose any shape. The result was error-prone stuff.
Clearly either the budget (or the client's capriciousness — a separate issue and arguably the bigger one) scared away the one guy, who I actually wanted to talk to but could not track down. He possibly had the origin story, and I wanted to know why his particular abstraction, which was at odds with the framework, was there. It was good code in the wrong shape, and it clearly used to do more, and that is interesting.
All the expedient people who had decided to avoid his code and just patch in duplicated pieces around it were the problem. There was no form to their solution at all. And that had clearly happened over some time (because you could see several different code styles)
rubyn00bie 17 hours ago [-]
I am confused by this comment. The root problem was the wrong abstraction was implemented. Then it was duplicated. Had there been no abstraction, it would not have been duplicated so readily? Am I missing something?
dofm 17 hours ago [-]
I will reword it slightly, I typed too fast.
rubyn00bie 18 hours ago [-]
The same problem exists, and I think is unfathomably worse, when the wrong abstraction is used throughout a code base.
Abstractions are a form of coupling, and coupling can be good, if the components are truly interdependent, and have a well defined domain. The problem with most abstractions, and I’ve seen this time and time again, is that they become brittle, are over used, and the cost of maintaining them grows exponentially with the size of the code base. With the reason for the cost ballooning being the system has disparate components that look interrelated but are absolutely not. Once you give someone a hammer they tend to assume everything is a nail.
The biggest problem, IMHO, is that abstractions are often used where a pattern would be more effective, easier to maintain, and easier to iterate on. And the primary difference between a pattern and an abstraction really comes down to coupling. Patterns remain decoupled, abstractions are tightly coupled.
And to be clear, I will and do use abstractions, when and where they make sense. But only after clear patterns emerge, and it’s been proven that components are truly coupled.
I will gladly die on the hill, that abstractions are measurably worse than duplication an overwhelming amount of the time. They’re often nothing more than a form of premature optimization.
zingar 17 hours ago [-]
What’s the difference between a pattern and an abstraction?
shinycode 18 hours ago [-]
At work there’s been a huge number of duplication in the start of the company and no solid abstraction. So no tests as well. We introduced tests in the current architecture but rewriting code has a huge cost to make sure there is no regression. When we talk about a saas it’s non-trivial with many customers relying on this tool daily as part of their workflow, regressions because of rewrite could be really painful for them. So we must give a greater budget to take the time to make sure nothing major breaks. So there is a debt that is compounding over time because code is added. Duplication is bad and weird/purist abstraction could make the architecture so rigid that rewriting things could generate hard to understand and catch bugs.
It’s hard to find a good balance and it depends on the kind of business and scale of project. Hard to make that a generic advice.
ghosty141 18 hours ago [-]
I think all these comments here are kinda talking past each other.
It all depends on the amount of duplication and the complexity of the abstraction. Like you said, no generic advice is possible that clearly separates it into "abstract here" and "duplicatehere".
In your example it sounds like we aren't talking about 2-3 places where duplicate code existed that just needed to be refactored into separate units. It sounds more like a complete disregard for abstraction to move on quickly.
If you see duplicate code and have a good understanding how to solve that then it's totally a good thing. The real problem comes in if you add abstractions without knowing wether they will hold up. And this is where the blogpost comes in. In my opinion 2 duplicates are fine, at 3 you should start thinking or implementing an abstraction if you have a good understanding of the code and usecases.
chairmansteve 18 hours ago [-]
"It’s hard to find a good balance and it depends on the kind of business and scale of project".
Exactly. The abstraction purists are not working in the messy, dead line driven real world.
17 hours ago [-]
pfannl 17 hours ago [-]
The real rule is probably: duplicate until the abstraction stops looking like a horoscope.
bluefirebrand 18 hours ago [-]
Yeah, "Write Everything Twice" is a pretty common and sensible direction for any codebase
marcosdumay 17 hours ago [-]
It's sensible if you have strict control of your duplications. You do have strict control of what is duplicated and where, right?
Write everything twice quickly becomes write everything 4 times once a new change appears, just as quickly as it becomes write everything 8 times, and so on.
I'm afraid there's no sensible soundbite developers can follow blindly.
coldtea 17 hours ago [-]
>Write everything twice quickly becomes write everything 4 times once a new change appears, just as quickly as it becomes write everything 8 times, and so on.
That's a good problem to have. Getting to 4 or 8 or 12, and then pruning it to 1 or maybe 2 or 3 clearly different cases, is better than shoehorning multiple cases into the wrong abstraction, having everything that speaks with them coupled to that and dancing around their assumptions, and then having to untangle that.
Duplicated code is by definition LESS coupled.
cwmoore 18 hours ago [-]
Yeah, ~"Write Everything Twice"~ “Copy and Paste Working Code” is a pretty common and sensible direction for any codebase
lanstin 17 hours ago [-]
In C I used to make it so my standard per-file and per lib code could be cut and pasted to other files/libs without modification. (E.g. every file had a mLocal variable that was file-visibility symbols, every module had a #module define for logging, there was always a mLocal.stats member, etc. ) I think some of this duplicate vs. abstract depends on your languages expressiveness - Rust or Lisp with good compile type power make it possible to squeeze out a lot of duplication that in less expressive languages are just idioms - here’s the five lines to make a syscall, or here’s the skeleton of parsing a portable network buffer into a native object.
Having a lot of if/else in your code is definitely a cost. My weakness isn’t so much the libraries and APIs, but the actual binary - once I have a service that does A very well, and I run into needing A’ I mostly just add in a config line “op_mode = A|A’” and have the else/if chains in the server driving code. Moreso for CLIs that I use myself than production services, but I have added tunables for consistency and replication to datastores to allow new use cases and expand my footprint in the data center.
fny 18 hours ago [-]
Code duplication is cheaper than the wrong abstraction. If you have a good abstraction, you should run with it.
If you haven't figured out a good abstraction at 5-100 customers, God help you.
feoren 18 hours ago [-]
A good abstraction? As in one? I'd go so far as to say the process of discovering and refining abstractions is the most important part of software engineering. A large project has dozens of abstractions, and some of them are "wrong" at any time, as you discover over time. None are ever perfect. If you wait to stop duplicating code until you have the "right" abstraction, you are just putting off the hard part of developing software and taking on tech debt.
Half of your abstractions are wrong. The hard part is knowing which half.
lanstin 17 hours ago [-]
I once worked at a place with abstractions I found to be beautifully perfect. The people that wrote the base framework had done similar things two or so times previously and got it right the third time. You couldn’t write slow or hard to operate code there without really trying hard.
meerita 18 hours ago [-]
Good abstraction does one single thing and does it well. Bad abstraction starts from the premise of becoming a dumping ground. If that is the case, the best and ideal scenario is splitting the abstraction into many ones to make the job better.
stymaar 18 hours ago [-]
> Code duplication is cheaper than the wrong abstraction
This is tautological though, it's like saying “starving is much better than eating the wrong food” (for instance: eating quick lime).
Of course you'll always find a way to do things wrong in a way that is costlier than not doing anything.
blauditore 17 hours ago [-]
Sure, but obviously that sentence implies that wrong abstractions are fairly common.
enos_feedler 18 hours ago [-]
What if there is no good abstraction for the entire stack of software on each of computers? What if we built a common one because we had to? What if now we get to all make our own with natural language?
dofm 18 hours ago [-]
I disagree.
But also it's very possible to not realise you needed an abstraction until it catches fire in multiple places.
And quite often it's not you that got the codebase to a hundred customers, is it? Sometimes it is a sequence of fresh-faced young developers who didn't have the authority to say "this duplication is bullshit" and were instead compelled to repeat it.
I think a lot of these discussions happen in nice little blog-post vacuums of progressive thinking, where people can go "mmm, object oriented coding obscures intent and clarity, mmm", blog posts with "an X is a Y", "the unreasonable effectiveness of foobar" etc.
In the real world, every duplication that works sticks for good; there is rarely budget to electively replace code that isn't broken. Until one day it doesn't work. And then… how many times is it actually duplicated? How many of the duplicates diverged? How many of these do we no longer need?
chairmansteve 18 hours ago [-]
> I disagree.
So... the wrong abstraction, no matter how bad, is better than code duplication?
dofm 18 hours ago [-]
If you read my original comment I said pretty much this, yes.
> I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are past a de minimis threshold.
I appear to be in a solid minority thinking this. But I'm OK with it. I'm probably not going to write a blog post.
locknitpicker 17 hours ago [-]
> If you haven't figured out a good abstraction at 5-100 customers, God help you.
This blend of opinion is very naive. Every single project is a business requirement away from having the wrong abstraction in place.
Good one. See, we did make some serious progress, all you AI haters.
mytydev 18 hours ago [-]
It sounds to me like you are describing a good abstraction. This article does not claim that code duplication is better than any abstraction. It claims that code duplication is better than the wrong abstraction. I'm sure this author would agree that a good abstraction is better than code duplication.
dofm 18 hours ago [-]
I'm afraid this comment reads in a rather gnomic way.
Of course it's a truism if you just say any abstraction that works is a good abstraction.
That is not what I am saying at all. Bullshit abstractions at least let you control the problem. Duplication doesn't.
vlunkr 18 hours ago [-]
But it’s never going to be 1:1 duplication is it? Sometimes it’s better to copy code as a template for something new, rather than try to immediately force a new abstraction.
I agree with you that it’s a truism, but it’s useful advice for people who have a habit of trying too hard to DRY their code. IIRC the author comes from the Ruby world, where DRY was a big thing, and this talk was part of the pendulum swinging back away from this DRY obsession that sometimes just resulted in convoluted code.
agumonkey 18 hours ago [-]
You seem to have experience, I dont mind factoring / unifying logic, when done sensibly with enough history in the trenches. It pains me more whenever a young dev comes in and barks "we must merge these two things!" repeatedly without planning for more than two cases and starting to add more and more boolean variables. Crystal makers. Then the obvious issue comes, the two variants weren't that close and now there's one god class trying to handle all forces in one big state.
I agree that LLMs are naturally anti abstraction machines.. I'm often trying to find way to reverse that.
dofm 18 hours ago [-]
> I agree that LLMs are naturally anti abstraction machines.. I'm often trying to find way to reverse that.
I am a bit of an LLM cynic but I am trying to learn it all, and I have to say I have spent most time trying to work out: how do you explain how a brown-field codebase actually works, in such a way that the LLM won't pervert it through misunderstanding.
It does encourage you towards the "conventional" coding standard for any new project, because you want to use a pattern that it will have seen in its training set.
But for example there are differences of opinion in how wordpress plugins (which have a very complex control flow) should be structured. LLMs are incredible at knowing how WP works, actually, but what is difficult is explaining how your methodology for a large plugin is going to work.
It is a battle — but a useful one because it can be used for, er, studying the comparative belief systems of the LLMs.
jbeninger 17 hours ago [-]
The gold standard is code samples. I've got 1000-line convention documents with very simple rules like "Early returns on a single line". Llms sometimes ignore these or misinterpret them in unusual ways.
But if I tell it "read these files that use the same conventions" first, there's no misunderstanding, and the agent also picks up the general "tone" of the code. I have very little to tweak if I've defined the problem well.
dofm 16 hours ago [-]
> But if I tell it "read these files that use the same conventions" first, there's no misunderstanding, and the agent also picks up the general "tone" of the code. I have very little to tweak if I've defined the problem well.
Oh that is a bloomin' great idea, and I can fully see how it might work better.
Can't tell you how valuable this comment has been to me and now I feel so much better about evidently kicking a hornet's nest ;-) Thank you so much.
jbeninger 13 hours ago [-]
Glad I could help. I've been trying to use coding agents more than makes sense this year to get a feel for the tech. There's no good set of guidelines yet and everything feels like secret knowledge.
If you're using a coding agent like codex or claude code, I've also seen marked improvement by telling the agent to keep a journal of decision points, and every file read or written. And then, here's the important part, read the last five journals before starting. It primes the context with whatever you were working on and keeps a new session more focused that if it has to go searching for keywords through the whole codebase. It can also be an interesting read.
wonnage 18 hours ago [-]
They don’t have a useful belief system, one of the rookie mistakes of using LLMs is asking them what you “should” do
dofm 17 hours ago [-]
Absolutely. I think the bit I still struggle with is finding a way to get them to join my team (which is a team of one very tired person).
A story I like is that in the now lost era of handwriting recognition on PDAs, Jef Raskin concluded that the easiest way to solve the problem was to change handwriting so as to meet the algorithm in the middle.
That is, to find a noticeable simplification of handwriting that people could learn quickly and that eliminated hard-to-process quirks.
I feel I am there with the LLM at the moment, trying to work out what the common ground is.
ChrisMarshallNY 18 hours ago [-]
In my experience, the answer is always "It Depends." That's about the only thing that I can hang "always" on.
It really depends on the exact type of code we're working with, and what our objectives are.
In my case, I often use object inheritance. It's a damn cheap way to DRY. However, when people hear "inheritance," they often think "polymorphism." There's a really big difference between the two, but popular culture has jammed them into one ball, and it's not worth the agita, to try to explain the difference.
But if you are doing optimization, long stacks can be your enemy, and inheritance tends to have long, windy stacks.
In these cases, the copy/pasta method may well be the best approach.
Like I said, "It Depends."
tomjakubowski 16 hours ago [-]
> In my case, I often use object inheritance. It's a damn cheap way to DRY. However, when people hear "inheritance," they often think "polymorphism." There's a really big difference between the two, but popular culture has jammed them into one ball, and it's not worth the agita, to try to explain the difference.
I agree that we should think of inheritance and polymorphism separately. If we want to express this intent in object-oriented code, how can we use inheritance to deduplicate code, while preventing misuse of the resulting object hierarchy i.e. the use of base classes in a polymorphic context?
In C++, IIRC private inheritance would do the trick (you cannot static_cast DerivedWidget * to BaseWidget * if DerivedWidget : private BaseWidget), but most OO languages don't support private inheritance. It's also not possible, as far as I know, to "lock down" BaseWidget * so it cannot be used as a base class pointer from any derived class: instead, you have to apply the private inheritance to every derived class to enforce this rule.
Another approach is to use has-a instead of is-a: i.e. instead store a BaseWidget object as a member of DerivedWidget. This allows for re-use without supporting polymorphism.
ChrisMarshallNY 14 hours ago [-]
Or...we could hire folks that actually know how to write code, without screwing the pooch.
This is especially true, with languages like C++. Someone (I have heard it attributed to Bjarne, but I don't think he said it) said "With C, you can shoot yourself in the foot. With C++, you can blow your whole leg off."
But there's stuff that can basically, only be done in C++. It's a very powerful, mature, and storied tool; meant to be used by competent grownups.
In tech, we have folks that seem to be absolutely convinced that we can have tools, so marvelous, that we can hire total incompetents, and that they will magically write good code. I know of no other engineering discipline, or craft, where people think like this. They usually have rigorous career ladders, with lots of gates.
Maybe Finance sometimes lets knuckleheads behind the wheel, but then, you get things like the Barings Bank disaster.
"What's Barings Bank?" you ask. "It doesn't exist! Is it a hallucination?"
No, it is not. Unfortunately, they let a rather junior trader, named Nick Leeson, behind the wheel...
It's possible that LLMs may finally give us something like what people want, but I suspect that we'll be seeing folks stumping around on one leg...
18 hours ago [-]
nfw2 18 hours ago [-]
Over-engineering and "abstraction hell" are very much not iconoclastic concepts
mawadev 18 hours ago [-]
I think you applied this idea into the era of LLMs but consider an abstraction that takes in multiple god structs for branches it may or may not call in the case you are looking at and has a lot of if conditions that explode in combinatory complexity across a deep call chain. Now the bottle neck is that you need to call this function 144 times a second. That is where you start to have clusters of hot code paths where the latency stacks depending on the angle the god structs come in.
Not sure what LLMs do here, I don't vibe code
dofm 18 hours ago [-]
I am applying it to LLMs on the basis of twenty years of seeing smaller programming shops tie themselves in knots by using duplication to avoid developing an abstraction that would help them because they were unsure of it.
Everyone always thinks duplication is fine when you can bill the modifications by the hour. But they never think to understand that the reason they've had so many employees is that they've turned their change process into firefighting all the different versions of the same code and all these young developers burn out from the sheer anxiety of not knowing where all the little fires are.
I once had to rescue a site that had become a victim of its own popularity, that was written by subcontractors who clearly believed that duplication is better than the wrong abstraction.
Until one day, along came a change — MySQL 4 to MySQL 5 — and a significant duplicated query no longer worked due to its new, proper strictness.
The problem was compounded; not only was the broken pattern in hundreds of places where it had sat, stable and predictable, but the pattern was broken because it, itself, was avoidance of another abstraction that would solve it.
They quit: they said they couldn't and wouldn't fix it. It had always worked how they had done it, and it would have to stay on MySQL 4 (which the hosting provider refused to accommodate).
I don't think it helped that they were severely misguided in their understanding of SQL, but the code had become beholden to duplication and then crippled by a new problem in the duplicated pattern.
I had to first find all the contexts in which that pattern appeared (which required me to spend half a day on a bespoke script) and then work out a new pattern and as few variations of it as possible to fix the duplicated code in each place, because there was no proper budget to rewrite the whole thing. And then I sat at my desk, for days, working through each one, figuring out how to change it to fit the slightly different expression of the pattern.
Even a total bullshit abstraction would have saved that client both time and money. And this is only one of dozens of times I've seen small firms simply duplicate and change code that would later become unmaintainable because of a straw breaking a camel's back.
Capricorn2481 18 hours ago [-]
Again, this is the opposite of what the author argues for, which is waiting for a couple instances before committing to an abstraction. Not duplicating a SQL query across hundreds of places.
I would be curious if the previous coders you're talking about actually cited duplication as a good thing. You seem to be implying they are. But almost every instance I've seen of massive code duplication was just from bad programmers shooting from the hip, not from some ideological stance.
dofm 18 hours ago [-]
> Again, this is the opposite of what the author argues for, which is waiting for a couple instances before committing to an abstraction. Not duplicating a SQL query across hundreds of places.
Right. But this is a hypothetical, in-a-vacuum situation.
In the real world, your two, three duplicates are in production.
"We really should now de-duplicate this"
"There is not the time or budget, just copy it again; we'll replace all this one day".
Capricorn2481 11 hours ago [-]
I don't run into that because people I work with with physically cringe at copy pasting more than two blocks of code in multiple places. If anything, we revisit old code and realize how overly abstracted it is. I don't know of anyone that duplicates code to save time, but I believe it happens.
11 hours ago [-]
nextaccountic 12 hours ago [-]
The trouble with the wrong abstraction is that sometimes you really do want to change one and not the other. It's code that superficially look the same, but only temporarily (taking a snapshot at the current time) - they are meant to be distinct in the long run
a-dub 17 hours ago [-]
i agree with the author. i argue a preference for loose coupling over centralized abstractions. sure it's pleasing to compress the code, but if the use cases actually are sufficiently divergent (as well as bugs and externally driven changes) ultimately it becomes brittle, littered with edge cases behind if fences and both challenging and daunting to change.
ideal case: support libraries and then very simple duplicated code that is easy to read and modify. critically the core control flow should remain duplicated, but simplified by the support libraries.
Capricorn2481 18 hours ago [-]
> I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are passed a de minimis threshold.
Pretty much everyone arguing for duplication has argued what you are saying, which is wait to see a few instances of it before committing to an abstraction. No one is saying duplicate everything 100 times. So I don't think this discussion was ever iconoclastic.
dofm 18 hours ago [-]
The point is it sounds all smart and sophisticated and principled in the abstract environment of a code discussion in a blog post.
In the real world, duplication happens in an emergent way, there isn't the time each time to judge whether it's really time to just quietly abstract that code, you may not get the permission, budget or window to do it, and if you don't stop the rot really early you are locked into the pattern.
jbeninger 16 hours ago [-]
But... it shouldn't. People are arguing that a bad abstraction is better than none at all. Badly-implemented abstraction is the same. If you hit code that is duplicated organically a dozen times, you don't make it a baker's dozen. You spend a bit of extra time at least stubbing out the abstraction so future organic duplication can at least share an entry point. Abstractions grow organically too, in well-tended codebases.
cjfd 17 hours ago [-]
100% agree. 'Code duplication is far cheaper than the wrong abstraction' is a very good candidate for the worst programming article ever.
yowlingcat 16 hours ago [-]
One of the most challenging kinds of thought to work through with my engineers in professional communication is nuance. For example, they may say something like this, but actually mean "For a particular situation, this is wrong."
The context a decision is evaluated is particularly important for "rules of thumb" like this. There's the rule of 3 (which many senior engineers imparted to me earlier on in my career) - don't refactor until you've actually duplicated it thrice, but even so, what they speak of is a catch-22 that's pretty important to reason about carefully.
On one hand, if you overcorrected on the fear of abstraction, you could easily end up with 500 duplicates that are slightly different and need to be maintained 500 different ways, slowly causing slightly wrong behavior some of the time, data corruption, combinatoric explosion. Surely, once there is such a situation, some degree of abstraction is the only right decision.
On the other hand, if you overcorrected on the fear of duplication early on, you could easily end up with a premature optimization and complexity -- complexity which, most importantly, could be rooted in a gap of understanding of how the code will be used and what direction it may go in over time (often based on which direction the business will go over time).
The only answer that actually works, of course, is "somewhere in the middle." Obviously, that's pretty vague and not very useful. Where, exactly, in the middle IS the right place?
As the years have gone by, I've become more and more steadfast that the answer to that question is and must be an art and not a science. Of course, it must always be rooted in practicality, the actual context of the code around it and where the code/business was in the past and where it will be in the future.
But just as importantly, some of it must be based around beliefs in the face of imperfect information about what you want to invest in for the sake of the technology, the team that develops it, and the business that relies on it. It could be that for your team, your values make it make sense to go a little bit further than "good enough" on normalizing your data modeling, because the way you like to run your business requires that normal form to do the analytics and make decisions productively. It could be that for your team, your values make it make sense to go a little bit further than "good enough" on splitting service boundaries and ensuring clean queues and message passing infrastructure, because you have seasonal spikes where you need to scale up to a ton of load and then scale down after without constantly doing a song and dance or pre-provisioning fragile infrastructure.
But the most common thread there is - art, not a science. Every single decision depends on YOUR team, YOUR business, YOUR needs - and like any art, there is no universal rule or discovery or best practice in the industry that will magically work for your needs without working through the details of whether it appropriately fits your situation or not.
So with that said - I can't really agree with you. At any place I've ever worked with a competent team, maintaining duplicate code is just not that hard and follows the same process for being dealt with. Built a robust test suite that encodes the actual differences and the shared structure. Pull out the pieces that have a good reason to be abstracted and redesign the pieces that encode the true differential structure in a way that is intuitive. Lather rinse repeat. It's always straightforward because it's known - by the time you are doing this process, you've had tons of repetitions and data on what is driving you to develop the abstraction, so when you make the decision, you are making it empirically.
Conversely, I have seen many otherwise competent teams slowed to a halt with premature abstraction. Frameworks that were well intended and reduced duplication, but encoded coupling between components that at a certain point in the businesses progression, fought with reality rather than aided, and all because they were frozen into place before anyone empirically had really clear data about whether the abstraction would be worth it long term. Well intended "clean code" refactors that were meant to solve the old "bad duplication" but instead created a far more difficult to reason about "abstracted base" of code that didn't really solve any of the domain modeling problems and was just as difficult to maintain without introducing buggy behaviors (if not more so) than before.
The biggest problem is that premature abstraction is sexy and fun. There are incentives and dopamine hits from doing it extraneously. But fixing legacy duplication is not fun. And so when it gets done, it tends to get done in a pragmatic way to relieve pain rather than to elicit pleasure. That, I believe is one of the biggest confounding sociological aspects of this whole discussion.
cyberax 17 hours ago [-]
My general rule is to start refactoring once you have three copies of the code.
Starting with abstraction when you are only beginning something rarely works well and leads to code bases littered with interfaces having only one implementation.
Abstracting the code when you have two copies does not always pay off, especially when you end up not needeing more than just two copies anyway.
But once you have three copies, it's indeed time to start generalizing.
thinkloop 18 hours ago [-]
The key lesson is that duplicate code is not necessarily "code duplication" - it was always really about abstraction duplication. If two unrelated variables happen to momentarily share a value, it doesn't mean that value should be made common between them, they are fundamentally different things. It would be a confusing lie and error-prone if the code implied they were the same and that efforts should be made for them to be in sync.
dofm 17 hours ago [-]
I guess any blog post can remain true if you can optionally take one of the key terms and redefine it so it can also mean the opposite?
Thaxll 18 hours ago [-]
So you centralize 3 liners?
dofm 17 hours ago [-]
I said "beyond a de minimis threshold".
But in one of the scenarios I mention earlier, I earned a chunk of money once fixing an issue that emerged in a subcontractor's four or five line duplication that had ended up rippled through a long-lived codebase. A ground truth (MySQL version) changed, and the pattern broke everywhere, including places where it had evolved.
So I tend towards thinking, yes, any three-line pattern that is likely to appear everywhere should, perhaps, be centralised.
It's certainly worthy of serious consideration. Usually pretty easy to maintain the surface of such an abstraction.
tracerbulletx 18 hours ago [-]
Huh? If anything having lots of customers makes the argument for duplication stronger. The issue is almost always once you get huge and 5 product teams are trying to achieve 5 different goals by using the same overwrought abstraction instead of just copying and decoupling. The abstractions that are actually stable end up becoming libraries or platform team owned systems that no one ever really touches.
jimmypk 18 hours ago [-]
[flagged]
skreem 10 hours ago [-]
Nice to see Sandi mentioned! If anyone liked her philosophy / writing style I highly recommend you check out her books
I read “Practical Object-Oriented Design (POODR)” ages ago at this point, but it reshaped how I approached OOP
Granted… OOP as a default paradigm has fallen out of favor (at least for me), but it’s still everywhere & won’t be going away. She gives a great framework for making it sane
fjfaase 18 hours ago [-]
I once used code duplication to implement a fourth type of dialog that looked somewhat similar to the others, that were sharing a lot of code, because I felt that although it looked much the same as the others, there was some fundamental difference. Took me about a day to implement. When some other engineer saw this, he spend the next three weeks trying to integrate all of them with some shared class. His work was not completely worthless, because he did find some small bug during all his efforts to avoid any possible code duplication. I already had predicted that it would take a lot effort, but I did not object, because I hoped that he would learn something from it and the next time think twice before always trying to avoid code duplication.
ketozhang 13 hours ago [-]
I like to think most seniors know to not blindly follow DRY. However, I can tell many of us are uncomfortable with the idea of needing to maintain multiple duplicated sources of code.
To help with that, I think the simple model of two callers depending on a common code needs to be scrutinized. If the common code needs to change because only one of the caller needs it, then it doesn’t belong in the common.
The wrong goal for DRY is attempting to do it with encapsulation. Encapsulation shifts the refactoring work from
the caller to the common code. However this is not what you want because there’s a lot more consequence in updating the common code than the caller.
You can avoid encapsulation and still be DRY by having multiple thin abstractions that the caller needs to be aware about is better. In OOP you are taught SRP and IoC for this. In procedural programming, this just comes naturally as code calling series of helper functions.
platz 19 hours ago [-]
2016 (up to 2018 or so) may have been the peak of such varied activity in the developer ecosystem, including articles like this, whether it was discussion, ideation, OSS variety, language development.
There has been growth since but it's been concentrated into fewer channels and somewhat industrialized.
ninkendo 13 hours ago [-]
To me it’s distracting to think about duplicating vs creating an abstraction, because the answer is always “it depends”, which is not really an answer.
To me, the question is: can you look at this abstraction and understand why it exists, without knowing who’s calling it? If so, it’s probably fine.
If an abstraction only makes sense because of the particular weird details of these 3 callers that have to pass mutually exclusive arguments to it to get their desired behavior, it’s probably wrong. An abstraction needs “a place to live” in your architecture. It needs to be self-evident in justifying its existence.
If you find yourself repeating code, but de-duping it would create these sort of weird non-self-justifying abstractions, your architecture is probably a bad fit for the problem you’re trying to solve. Maybe that’s because the problem changed since the software started (which is a bit of a pickle: do you re-architect, or do you continue writing weird inscrutable code?) or maybe it’s because you just picked the wrong abstraction in the first place. But you should recognize it: duplicating vs wrong-abstraction is about choosing the lesser evil. If the abstraction was a natural fit for the problem, you wouldn’t need to answer this question in the first place.
"If you have a procedure with ten parameters, you probably missed some." -- Alan Perlis
christophilus 19 hours ago [-]
Yes. I’m dealing with a graphql, urql, Next, Prisma stack at the moment. Something that would be a handful of lines of code in a different stack ends up being hundreds in this one.
The Node ecosystem is full of wrong abstractions.
Rohansi 19 hours ago [-]
The problem is self-inflicted. You do not need to keep jumping to the next trendy framework.
christophilus 18 hours ago [-]
I inherited this one. My preferred SPA stack these days involves Porsager’s Postgres library, a simple RPC stack with Zod schemas, and Preact.
Even better is an old school MPA with progressive enhancement.
RussianCow 19 hours ago [-]
I don't know about you, but I generally don't write code in a vacuum. Other people may have touched it before me. Those other people may have made poor decisions.
Not that I'm immune from choosing the wrong abstraction sometimes. More than once the "other people" was me. We all make mistakes.
Rohansi 18 hours ago [-]
Of course, but we should all be doing our best to push back against unnecessary framework churn.
em-bee 18 hours ago [-]
if the majority of the team agrees, sure. but if i am in the minority then i'll appear uncooperative, and that may not be a position i want to be in.
Capricorn2481 18 hours ago [-]
Do you want us to call the previous company and explain what their framework choices did?
em-bee 18 hours ago [-]
my paycheck needs me to.
cpursley 18 hours ago [-]
Nextjs in particular is a dumpster fire, it's a shame that it's the default stack many LLMs slop out.
stcg 17 hours ago [-]
This is like saying "A slow leak is cheaper than a burst pipe"
Yes, okay. But with both you will have a bad time cleaning up.
There is a third option: good abstractions.
I did see this pattern described in the blog in practice a lot (and fell victim to it myself) and I think that in general this comes down to inexperienced programmers. Object oriented programming makes it worse.
Teaching these programmers that they should not abstract is not the solution. It is blocking their growth.
Teach them how to make better interfaces instead.
dang 16 hours ago [-]
The OP is aware of good abstractions and is describing a procedure for finding them, or for increasing one's chances of finding them.
16 hours ago [-]
zadikian 16 hours ago [-]
The saying is you build better abstractions if you don't build them too early.
jbvlkt 18 hours ago [-]
It depends if duplication is accidental or real. I.e. if two taxes are using the same formula, it is accidental. If you use the same physic formula on multipla places, it is real duplication.
Verdex 17 hours ago [-]
Cheaper is skipping a step.
Code duplication and 'wrong' abstractions both count themselves amongst the other foibles of programming. But they don't directly produce a cost which can be cheap or expensive.
They produce some other high dimensional intermediate value which can then produce highly variable cost dependent on the domain, goals, and scenario.
As ever, it depends.
The depends is quantifiable, but it doesn't fit in a blog post. Think more along the lines of war and peace.
time4tea 17 hours ago [-]
You dont know immediately if something that superficially seems the same actually is.
Copy and paste once is fine, twice, not so much.
Often I've seen two totally different things exist in one bit of code, no overlap!
Premature generification is bad, and leads the developer to believe that two things are the same, making it harder to see they are not.
Also, can make it much harder to see that a different abstraction would give a cleaner outcome....
corysama 13 hours ago [-]
I always liked the advice "Abstract for replacement. Not for reuse."
If you have code that is reusable, you'll want it to have a nice interface. But, you don't need an abstraction on top of a nice interface. Just use it.
For abstraction, what you need to focus on is "What is most likely to change in the future?" You want to put in abstractions that will make those changes low-cost.
Ex: At work there was a small debate about which C++ JSON parser to use with no stand-out winner for our framework's needs. So, we picked one and I put a thin layer over it for everyone to use. We have since then swapped out the parser and swapped it back over the years of a hundred devs using it in our framework and no one noticed the swaps.
antonymoose 19 hours ago [-]
Twice a coincidence, thrice a pattern.
MrGando 18 hours ago [-]
I once had to work with a system that was refactored and abstracted away heavily to use Redux. It didn't work then, the implementation had way too many abstractions, doing any change meant you had to touch dozens of files. It was insanity. Left me with a bitter taste regarding the redux pattern for ever (probably not the pattern's fault).
SoftwareMaven 18 hours ago [-]
Over-abstraction is as much of a problem as under-abstraction. If the abstraction isn’t improving your ability to produce good code, it’s a bad abstraction. I’ve worked with a lot of abstraction patterns in a lot of languages over the 30 years of my career. Any of them can be good or bad. Unthinkingly applying them is always a problem.
Unsurprisingly, that goes for just about any idea in software development. I worked in one code base that heard small functions are best, so every function was less than three lines long. You don’t gain anything by replacing `lst.get(0)` with `get_first_item_in_list(lst)` (in fact, understanding becomes much more difficult), but breaking down functions into the smallest units that make sense independently within the business domain can be very helpful, both for understanding and testing.
MrGando 18 hours ago [-]
100% agreed. It's interesting since I see over-abstraction often abused by "clever" engineers (sometimes quite experienced actually). Sometimes I wonder if they do that to make themselves indispensable on purpose and create their silos in the codebase.
originalcopy 18 hours ago [-]
While I see the point, I think I more often encounter the opposite. Duplication, but not exactly duplication.
Then the "sunk cost fallacy" is not an issue but there is huge maintenance cost and no-one feels like refactoring it. I'd rather refactor bad abstraction than 10x duplication.
em-bee 18 hours ago [-]
but those are exactly the cases where the distinction matters. when you have a situation where you can't duplicate the code exactly, then you really have to look carefully if this is actually the right place for a shared abstraction. i tend to wait and see if i can refactor one or the other to get them to be exact duplicates and only then see if i can fit in a common abstraction. and yes, finding that i later need to make the same change in both places is a sign that a common abstraction is probably the right call.
northisup 18 hours ago [-]
Duplication is fine, triplication and above is the issue.
mjevans 18 hours ago [-]
Triplication tends to be where it becomes more clear what the correct thing to abstract or de-duplicate is.
It's of course possible to functional-ize segments of logic, but then the question of state mutation must be brought up. How isolated are these changes from other parts of the code / system state. Can this be run in parallel or is it something that must be serial? What potential race conditions exist?
esailija 1 hours ago [-]
Another way to put it:
Things that should be tightly coupled but are not is preferable to things that should not be tightly coupled but are.
I agree especially because coupling things is easier than uncoupling things.
infinitebit 15 hours ago [-]
I feel this deeply. Although abstraction isn’t a one way door, “deduplicating” logic tends to be much easier than breaking big functions back down, and so these days I tend to leave a comment with the date wondering if it is too similar to some other code. then if i come across it again months later and it still is, then maybe it is safe to make it DRY.
I think DRY is the first heuristic for “good code” that most junior devs actually grasp, and so they become very dogmatic about it for a while
codr7 12 hours ago [-]
I certainly learned this the hard way.
When I started writing code 40 years ago, I used to over estimate my understanding and abstraction skills a lot. As a result I created overly complicated and difficult to maintain/evolve solutions.
Turns out I need to see more examples of patterns before making good choices, which means becoming comfortable with seeing and tracking duplication over time.
bazoom42 19 hours ago [-]
Depends. If the abstraction is just a level of indirection, then it is usually pretty simple to eliminate - just hit “inline function” in the refactoring tool a few times.
On the other hand it is pretty difficult and error prone to consolidate duplicated code which have drifted apart over time.
If in doubt, chose the approach which is simplest and least risk to revert if you discover in the future you made the wrong choice.
I do agree a bad abstraction can cause huge problems. But it’s usually not the kind of abstractions introduced to eliminate code duplication, but the kind of top-down “architecture astronaut” abstractions, where a model is chosen which does not fit the complexity of the problem.
tetha 18 hours ago [-]
I watched a talk by her about this, and this post is missing half of the equation, which is really important:
Having a wrong abstraction means you end up with a class/function/module with a huge amount of configurations through boolean/enum parameters. It's not even clear that all combinations of configurations is even valid. This situation may be simplified by duplicating, and then eliminating code, thus creating more streamlined code for each use case. This may require fixing similar or cross-cutting bugs in multiple places (eg: JSON serialization is stupid, need to hack a workaround), but keeps the business logic changes simple. Maybe a bit more numerous, but the code is able to raise all the scenarios to consider.
Having no abstraction means you may have to change business logic consistently in multiple places, or you have to fix exactly the same misconception (aka a bug) in multiple cases. e.g. tax rate management in a multi-national context. This is also terrible, because you may fix an important problem in one place and forget other places with the same issue. Now you missed 12 potential bugs by fixing one. This can however allow you to discover a true abstraction. Maybe these 12 places should call just one place?
But for code evolving across a team understanding this tension, a bit of duplication while waiting for confirmation that these pieces of code break together and change together is better than just shoving the same 3 if-statements into a function to avoid "line duplication". Concept duplication is more important.
davnicwil 16 hours ago [-]
One way I like to think about is that often abstraction is an automation for a task that doesn't need automating.
You hardly ever change the thing and if you do, changing it in two or three places 'manually' is really not a big deal.
Now changing something fairly often, that affects logic in 50+ places? Then it makes sense to automate with an abstraction so it all flows through the same lines of code.
I know I've personally spent way more time over the years debugging bad abstractions than changing things in a few places.
omoikane 18 hours ago [-]
> Programmer A sees duplication.
This step should also be parameterized by how many times the duplication has occurred. Refactoring preemptively may lead to poor abstractions, but not refactoring after seeing the exact same thing tens of times would also be weird. See also:
I always try to design in a way, that using abstractions/shared logic is optional.
I've worked in too many projects, where every new feature needs to be built on top of existing abstractions, that often lead to severe restrictions if something slightly different is required. I always try to create reusable units/components, that can either be used as intended or replaced by something that behaves slightly different if needed.
Components are not necessarily frontend components, this extends also to backend logic.
luckystarr 18 hours ago [-]
How I see this:
Refactoring code to reduce the number of lines is _compression_, akin to RLE coding.
Refactoring the code to lift conceptually coherent parts is _abstraction_.
Less compression, more abstraction. Then you're fine.
lericzhang 9 hours ago [-]
Of course, as you said it's "wrong abstraction".
The real problem is it's hard to tell if an abstraction is correct before you see enough duplication.
danpalmer 9 hours ago [-]
That's exactly what the article is about.
gb2d_hn 18 hours ago [-]
Interface over inheritance is the paradigm I try and stick to. I'd rather maintain orthogonal code than code with overuse of inheritance because of over adherence to DRY.
joshmoody24 18 hours ago [-]
I've seen the pendulum swing between duplication and abstraction a few times in my career, and I'm currently on team "it's usually not that hard to find a good abstraction up front."
IMO it's easier to inline a bad abstraction than it is to consolidate a bunch of subtly different things that should have been abstracted from the beginning.
But I expect people's opinions on this differ wildly based on their personal experiences. Just my anecdotal take.
bob1029 19 hours ago [-]
If you work backward from the schema these sorts of things tend to evaporate before they can become a problem.
Some of the biggest rabbit holes come from naming conventions not aligning across the business and technology silos. If everyone agrees that Customer has exactly 34 attributes, then it is possible to move to the next step of sharing libraries of types across the team. Getting your POCOs/DTOs 1:1 across the board is when the duplication really starts to melt away.
LunicLynx 17 hours ago [-]
The generic repository "pattern" is the prime example of this. There might be CRUD operations shared between repositories, but they should not be that base of every repository.
I've seen this so many times, because in the beginning the CRUD stuff is what you code over and over again and then suddenly business logic emerges and everything breaks down, but the repository prevails because sunk cost fallacy ...
hakunin 16 hours ago [-]
I think what this advice is really getting to, is that you should prefer everything generally build-time/hardcoded/static rather than runtime/dynamic. Wrote about this in 2013 (calling it a CMS trap[1], back then seemed pertinent).
With LLMs the cost of duplication is much lower both to write and maintain. So abstractions needs much higher justification.
ninalanyon 16 hours ago [-]
You have some evidence for that assertion?
bogrollben 17 hours ago [-]
One thing I don't see talked about often is the fact that not all duplication is equal. Duplicated html/xml/markup does not equal template-based boiler plate, which does not equal almost everything else. I'm far more forgiving of duplicate html/markup because that code is so cheap.
felooboolooomba 16 hours ago [-]
The bad thing with abstractions is when you start it too early in your code base for things. It's also a bad thing when you start it too late, although not as bad. If you start it way, way, way too late it's very, very, very bad.
Of course, the worst abstractions are the ones you don't need at all.
DmitryOlshansky 17 hours ago [-]
I would argue that _premature_ abstraction is worse than _some_ duplication of code.
Also I’ve seen the kind of codebase that seems to be LZW packed due to the sheer desire to DRY everything out. Not pleasant thing, by the time you goto 10 layers deep on some “helper” function you forgot why you in there.
zadikian 16 hours ago [-]
I always felt like I abstract and modularize things way less eagerly than other programmers. Was pleasantly surprised to find that LLMs do it mostly my way by default, then again they're also bad at abstracting when it's actually needed.
dan-robertson 16 hours ago [-]
I think LLMs are trained to not refactor. I think it’s either that you would need to do something in training to make them want to do it and the labs don’t do that, or that the labs correctly guess that it would be very annoying for LLMs to go and refactor your existing code as they go. This creates bad effects (eg crazy hacks to avoid refactoring and, much worse, not refactoring the code they only just wrote as required) but I think the alternative would be worse – it’s not something you always want to read and the refactoring is often done incorrectly, restructuring the code to the best shape for the current task rather than something that balances many different needs.
throwatdem12311 12 hours ago [-]
My company paid to have her do one her day long workshop at HQ. Don’t know how much they paid but it was worth every penny. Changed my life.
KHRZ 19 hours ago [-]
This is the biggest lesson I got from LMMs. I have a 1 million LOC vibe coded project that I can only imagine would fit in a few hundred thousand lines. But it's still holding up, I expected some kind of development collapse long before this point.
cassianoleal 19 hours ago [-]
I don't think that's a good lesson.
OP is right that code duplication is far cheaper than the wrong abstraction, but the opposite is also true - the right abstraction is far cheaper than code duplication.
gavmor 19 hours ago [-]
Well sooner or later I would expect a developer who intimately understands their code base to feel compelled to start refactoring and extracting fitting, meaningful well-leveraged abstractions.
imhoguy 17 hours ago [-]
I don't think that will happen anytime soon. Prompts are the code now, and programming languages code is compilation product. Almost nobody optimizes compiled assembly code.
Perhaps "recompilation" - rewrite by replaying all prompts in strict code quality context (linters, complexity & dedup checks) would make better abstractions.
The only problem now is that LLMs are non-deterministic.
gavmor 17 hours ago [-]
> Almost nobody optimizes compiled assembly code.
Compiled assembly code is not an input to the next compilation; source code is an input to the LLM's next inference.
Sure, maybe "prompts are the code," but you must realize that code is also the prompt.
gb2d_hn 18 hours ago [-]
It's made me wonder the same, but most LLM generated codebases haven't been around long enough to judge maintainability. I have noticed issues in some of my more LLM heavy code when I expect a change to be replicated in multiple areas, assuming common code / styling was reused, only to find it wasn't. It's for that reason I can't use LLMs for client codebases without heavy scrutiny of every line generated (for my own hobby projects I'm a lot more lenient)
anon-3988 19 hours ago [-]
The problem with coming up with a rule that works for everyone is that everyone have a different idea of what makes a good abstraction.
Do you want to iterate using for loop or using .iter().step(2).map()?
I would rather have consistency than a mixed bag of levels of abstractions.
doix 19 hours ago [-]
> Do you want to iterate using for loop or using .iter().step(2).map()?
This isn't really a good example, assuming both can be used to represent the same thing.
The problem with the wrong abstraction is when your abstraction doesn't let you represent something. Then, because of you've already invested so heavily into it, you start contorting the problem to fit your abstraction and it becomes a shit show.
metaltyphoon 19 hours ago [-]
> Do you want to iterate using for loop or using .iter().step(2).map()?
I don’t think it matters, specially for sort sized loop scopes
digitaltrees 6 hours ago [-]
Sandi is amazing. I learned so much from her.
williadc 18 hours ago [-]
The "99 Bottles of OOP" book mentioned at the bottom was an excellent introduction to refactoring. I highly recommend it if you struggle with finding the right data models for the problems you work on.
he0001 14 hours ago [-]
If you can’t fit into the same abstraction, is it really a “duplication” or is it just a slightly, but incomparable, function?
ozgrakkurt 18 hours ago [-]
The discussion around this topic would be nicer if the title had "can be" instead of "is".
Otherwise what is better is better and we don't know what we don't know
jstimpfle 19 hours ago [-]
Code duplication is the wrong abstraction too -- unless it's not really code duplication but code that only happens to be similar for some really "unstable" reason.
dofm 18 hours ago [-]
I would agree that there are good "de minimis" reasons not to abstract code that isn't ready to be abstracted at all. If the pattern has not settled it shouldn't be forced into an abstraction (beyond those that make sure it is e.g. not vulnerable)
But beyond that, any stable abstraction is better than duplicated code.
danpalmer 9 hours ago [-]
AI: Why not both?
dmos62 18 hours ago [-]
If it's duplication, it's the same abstraction by definition. The fundamental unit of programming is intent, not code.
_pdp_ 16 hours ago [-]
The biggest mistakes young engineers make is working out a problem from bottom up... i.e. building frameworks and libraries, rather than exploring the problem space which is more chaotic.
You cannot find the edges of the system with structure you don't understand because once the abstraction are set in place solutions often have the same shape as the frameworks which leads to ultimately really bad systems.
The best way is often not the obvious way. Once you reach the edges then you can think how to program the abstraction but that is many versions down the line from the original.
jeffypoo 17 hours ago [-]
I've always told engineers to duplicate until the abstraction is punching them in the face.
aappleby 18 hours ago [-]
The smallest amount of simple code that solves the problem wins. Everything else is irrelevant.
alkhimey 16 hours ago [-]
The disadvantages of duplication are greatly reduced in the world of AI. From my experience it can easily detect the duplicates and refactor code safely. On the other hand, code without abstractions is easier to read and easier for AI.
With AI, we really need to rethink the clean code principles.
more-coffee 16 hours ago [-]
Seems to me like the last thing you want to do is worry whether the LLM has a large enough context window to keep an eye on all duplicates. So I'd argue to deduplicate directly, where possible.
aarjaneiro 16 hours ago [-]
Personally I've seen way more duplication as a result of AI in large codebases
Generalizing this in the abstract is a wrong abstraction.
hedora 18 hours ago [-]
I’ve seen code bases that evolved like that. The problem is almost always outside the abstraction that has a pile of conditionals.
Usually, some moron decided to copy paste things a few levels up and then the top half of the system metastasized into two parallel universes of broken garbage.
For instance, one might decide to perform auth later in the flow so unauthorized handlers can run and set a “this requires auth” bit that defaults to false, and the other flow could add a forged auth header before the auth step.
Now, the auth handler needs a “allow forged header” flag and a “already authenticated” flag.
I’ve seen that grow to a half dozen cases until massive production dataloss occurred. A buggy client tried to delete something local to their account without specifying a userid as a parameter (this codebase was garbage!) and deleted the something for all users instead.
I can’t remember how the dataloss was “fixed”, but it definitely wasn’t “all requests go through a simple auth check, and all handlers declare/implement their auth requirements in the same way”.
Getting a design approved to require a user id be specified exactly once for account-level operations was fantasy land for that team. (Most hires with any sort of engineering talent bounced in under a year.)
Anyway the “abstractions are hard so copy paste” approach did provide job security for the lifers on that product. I can’t imagine them holding a job elsewhere, but they were completely immune to layoffs (hostage style).
This is a pretty valid approach if you’re an agent hired to perform industrial sabotage, or if you keep replacing keyboards after you knaw through the corner.
TexanFeller 18 hours ago [-]
> Code duplication is far cheaper than the wrong abstraction
Very true in some sense, but I continue to encourage DRY-bias because I've literally never seen teams duplicate code responsibly and later dedupe it when it's the right time. 95% of the time this sentiment is quoted to justify shipping quick slop and stable reusable bits are never extracted into a shared lib later.
bluefirebrand 18 hours ago [-]
In my experience if your organization can't commit to doing WET (write everything twice) code then it probably also will fail at doing DRY (don't repeat yourself) code
Maybe this is an area where AI can help identify duplicate code though to show opportunities for de-duping
mcculley 18 hours ago [-]
Yes, if your programming language/environment is weak.
mohamedkoubaa 18 hours ago [-]
Duplication is often a small price to pay for isolation
andai 15 hours ago [-]
See also: Muratori, Semantic Compression ("Compression-Oriented Programming")
I don’t mind duplication at all. I mind undiscoverable duplication.
But if I have an interface and three subclasses with duplicated or almost duplicated code, this is quite easy to find.
That’s much nicer than an abstract base class where only some children override the methods, because now I need to check which ones actually do.
nullbio 17 hours ago [-]
Code duplication is terrible in the age of LLMs, unless you want to maximize on drift.
ilvez 18 hours ago [-]
Just three words: rule of three.
hyperpallium2 14 hours ago [-]
prefer semantic abstraction even when it creates duplication
gaigalas 10 hours ago [-]
I like the mantra: "prefer duplication over the wrong abstraction".
Combined with another interesting idea "whatever I dislike is wrong", it makes me _always be right_, which is awesome. I can never lose a discussion about abstraction with this powerful combo.
jongjong 14 hours ago [-]
Also, I prefer having all the code inside a single 5000-lines file than split up into many small files representing incorrect abstractions.
The urge to split the code up since they beginning is generally a bad idea; it forces early abstraction; more likely to be wrong.
7 hours ago [-]
threethirtytwo 15 hours ago [-]
Duplication and the wrong abstraction are looking more and more to be implimentation details handled by AI. A possible future may be a place where none of this matters.
16 hours ago [-]
lazide 16 hours ago [-]
Yes, but counterpoint - code duplication is also the wrong abstraction.
Pro tip - which is the least bad abstraction? Answer: it depends!
vcryan 17 hours ago [-]
The sweet spot is really duplicating the wrong abstraction: I see you Claude!
johnwheeler 17 hours ago [-]
These are not mutually exclusive.
slopinthebag 18 hours ago [-]
I prefer the go mantra: a little copying is better than a little dependency.
Abstraction is a vague term when used here. Is a shared function an “abstraction”? It’s more like implementation hiding, maybe some data hiding. But you definitely have a dependency on it now.
Acronyms like DRY are for beginners. Once you get good you know when to break the “rules” (and when not to).
But with that in mind, I mostly agree with the article: if it's not a violation of "single source of truth", then abstractions are just a convenience. If it starts being inconvenient, then it's not doing its job and there's no reason to use it. It's a serious code smell if a function needs several flags for custom behavior; that means it's probably the wrong abstraction or violating the single responsibility principle. If there is a legit need for lots of customization, an often-good way to handle is to take a function/functor as an argument for the customization. E.g., rather than `solve(f:double -> double, max_iters = 99, x_abs_tol = 1e-15, x_rel_tol = 1e-15, ...)` you can do `solve(f:double -> double, stopping_criteria: StoppingCriteriaClass)`
Fundamentally, the article addresses cases where it's not clear yet how many sources of truth there will be. Are the two spots in the code using the same algorithm, or slightly different versions? More importantly, will they change for the same sorts of reasons?
The title adage (correctly, imo) argues that making two different things the same will cause you more pain than making two same things different via duplication. In the latter thing case, the "damage" is just having to make the same changes twice, or doing a refactor to introduce the abstraction. In the former case, you have to keep adding to your abstraction, or undo it. Most crucially, it breaks "locality", which is the only property you really care about when making changes. I just want to make this change and not worry about side effects to unrelated parts of the system.
Accidental divergence is the problem, not intentional.
But, again, the point is that you don't know yet whether you have a single source of truth or not. It's a question of the relative badness of duplication vs premature abstraction in cases where the code may diverge or converge in the future. There is no generic answer. But as a heuristic, based on my personal experience, I have always found premature abstractions to be more painful to work with. Even more so when someone else has authored them.
So many times I've had to untangle these types of abstractions when business asks for changes to case X but not Case Y. OR worse, business asks for changes to case X, but it also affects Case Y due to abstractions. Business see X/Y as different things so did not even think to mention that the new suggested behavior is to only affect case X, but to coders they're the same.
If you only spot the bug in path A and not path B, why fix the bug for B?
Theoretically and conceptually I agree. But in practice there are a lot of programming languages aren’t as expressive. People prefer codebases with duplications rather than visitor patterns everywhere. In essence, visitor pattern is a tool to solve multi-dimensional abstraction problems, just like type classes in Haskell or CLOS in Common Lisp. But it’s so verbose and non-straightforward so more often than not it’s not worth it even conceptually it’s a legit case for “single source of truth”.
That's a very nice rule of thumb. I've often overabstracted when two pieces of code look similar at one point in time and then they diverge.
And instead it gets replaced with the actual root of all evil, complexity.
Many problems have tons of inherent complexity already.
Fun fact: Win32 checkboxes are buttons with a bitflag that says they are actually checkboxes.
TL;DR: Vibes
Here we're loading the customer record and updating their discount %
Here we're loading the broker record and updating their commision %
They will have 99% identical code.
It's possible but exceedingly unlikely we have found 2 things that should be a load_record_and_update_percent(file,id,field,val)
Tomorrow the business logic behind one of those will no longer be a simple % and now you have a real mess.
It can, that's all about how aggressively you factor and structure your code, eg. combinators make it easy to reuse code in different application patterns without rewriting.
Even in that case the refactor can introduce mental overhead when having too many different variable / properties names
Very similar with patterns. I've often read people protesting that juniors overuse design patterns, yet I've seldom seen a junior (mis)use anything more complex than a singleton, and when they use any pattern, it's usually forced upon them by an opinionated Java framework.
I've seen it occasionally. There was one junior whose code I saw littered with DTO that're an exact copy of the business object and DAOs where every method is just a wrapper for a Hibernate method. But yeah it's rare.
Shape::Polygon::ConvexPolygon::FourSidedConvexPolygon::Square::BlueSquare...
"Intro to OOP" lectures/articles made a deep impression on some people in not quite the right way :)
Regarding OOP itself, I also remember when "favor composition over inheritance" became a thing. Was this reversed too?
I think this is generally still the advice, when working in OOP contexts.
Mind you, I mean enterprise and line of business software, not hobbyists. I also mean of their own volition, not the kind of nonsense that Java frameworks often forced on them (all the patterns under the rainbow, factory abstract method factory of abstract methods).
(Alas! Sometimes you pick up bad habits from experienced people, and being a junior, you don't know better)
„How Software Groups Rot: Legacy of the Expert Beginner”.
https://daedtech.com/how-software-groups-rot-legacy-of-the-e...
I have recently fallen into a job at a small company that really seems to have this culture. Thankfully, I'm only going to be here for a year and a half or so (fixed term job for working holiday visa), but I'm trying to be really aware of how its impacting my career development.
There is no automated testing, no meetings, seemingly no code review process, no standardization of schemas for files that are passed between different applications, all jobs are run on on prem desktop workstations.
This is the key, if they are very similar but used by different consumers the chance that they will diverge in the future is very high. And once they do they will break the abstraction.
… sometimes duplicate things unnecessarily.
but thats too philosiphical to talk about or for you to understand.
Put it this way. You're implying code can be duplicated as long as they are advertised to do different things. But can't that conceptually be applied to data as well? I have the number 5 representing age, and I also have the number 5 duplicated somewhere else representing cost. 5 is duplicated because they are "advertised" to do different things.
Because code and data are philosophically the "same" the properties of "single source of truth" applies to both in the same way.
The problem is not knowing which of the hundreds or thousands of potential truth sources is worth abstracting. The only real way of finding out is not abstracting them and seeing how it works out.
If the problems in SWE boiled down to solve(f -> MagicallyNoProblemAnymore) we wouldn’t have this discussion.
This is why we have to have programs that duplicate code by doing anything like adding two numbers together or complex logic that is easy to create bugs when someone wrote it 40 years ago better. Because code reuse is mostly done on a very small scale.
Given thats the case when you start on a new React project as an example you are not reusing application code you are duplicating the react framework so you can duplicate every other web app in every sense except maybe the visual.
There is no such thing as full reuse and until we get to a universal network invocable function tree that can be extended only when its truly unique we never will. Maybe AI will do this. People cannot.
At the end of the day code duplication needs to exist to optimize for local correctness (or incorrectness) and speed and abstractions goal is not to provide pure reuse. Its to provide a place to "put your logic" that may be similar and has access to typical state that some kind of widget might typically need.
But then I came across more cases: sprites with no directionality (an explosion), and corpse sprites (which were only 4 directions, 2 mirrors, and most except the first four were shared by both orcs and humans).
I agonized for a little bit on what the hell the common abstraction is for all this. In the end, I factored out some of the loading code, and made a UnitLoader, CorpseLoader, EffectLoader and moved on. Now, there's probably a better abstraction in there because all 3 loaders have to reason about the same things a little bit. But I will discover that abstraction later on and it's easier to just de-duplicate the code then, rather than try to identify the abstraction now and make some complicated EverythingLoader that handles all those cases.
I think the natural instinct with programming is to try and simplify the code by means of generalization. But we often over-simplify, and reality is messy. Or as TFA mentions, time passes and new requirements arise, so it turns out that we have simplified prematurely!
Sounds like this should be an aphorism. Premature abstraction is the root of much suck!
Personally I prefer what you’re doing over trying to come up with a non-obvious abstraction or trying to make an imperfect abstraction fit. Waiting til the abstraction is totally obvious and the need is crystal clear is a good thing.
The flipside (antidote?) of DRY is WET - write everything twice/thrice. More important, IMO, is to abstract only over things I have an actual, demonstrated use case for, usually demonstrated first via duplication, and not speculate about possible future uses I might want. Code written for future use cases we don’t have is so often the code that gets in the way of abstracting the things we do have, and it cracks me up when that happens.
I discovered this after a few early years of my career being a bit of a “best practices” zealot. The thing I say often at work is, “let’s get this shipped to prod so we can start learning all the things we don’t yet know about it.”
Besides, sometimes your duplication creates "bugs" which may turn out to be fun features that players enjoy.
So code duplication because of abstraction issues is rare. Code duplication because of siloed developers is so much more common.
At least that's my interpretation
using projection you can "call a function in two parts"
this is a useful pattern that you can use to first 'fix' data or behaviour to produce another functionhttps://en.wikipedia.org/wiki/Partial_application
Basically since moving to a functional approach in typescript I find I do not fight abstractions as I used to when I used classes and inheritance.
Mike's talk argues that code solutions need not be modelled on the real world, and that different data creates different problems, which need different solutions. I can't do the talk justice, but it's had a big impact on me.
Brian's talk is about abstraction generally, and how it's difficult to find the "right" abstraction.
1. https://www.youtube.com/watch?v=rX0ItVEVjHc
2. https://www.youtube.com/watch?v=Cum5uN2634o
I've always found it odd when even fairly smart engineers sometimes prioritize real-world metaphors over the actual needs of the codebase. Years ago when I was only a few years out of school, I was implementing a connection pool in Rust, and the most reasonable way to implement it was to have the connection hold a weak reference to the pool so that it could get checked back in automatically when dropped. My manager (an extremely experienced engineer) didn't like this idea because "a library holds library books, not the other way around". I didn't feel like this was a compelling reason to design things differently, but he refused to engage with the issue in any way other than through the lens of that metaphor. Eventually the impasse was solved by one of the other managers in my department suggested that while library books don't contain libraries, they do have the name of the library stamped in the back as a reference to where they should be returned, and I guess my manager found this to be a reasonable extension of the analogy. If I were more experienced, maybe I would have recognized that I could find a way to engage with the analogy like the other manager did without ceding the point, but even today I still feel that it was completely bizarre to insist on that as the canonical way to frame things rather than just considering the ramifications of the abstraction in the code and the experience of using the library based on it.
I mention this a lot, but in researching Data-Oriented Design (what Mike was talking about), I came across Richard Fabian's DoD book [1] which talks a lot about database normalization and the like. I found that odd, because the low-level high-performance game code he was talking about certainly wasn't going to marshal data into a DB to run SQL queries on it.
It turns out the relational model has a lot of advantages though. Programmers use trees all the time, in OO, in structs containing structs, in objects pointing to other objects. It's easy to forget that trees are just a special case of graphs (ie. networks), and that there are many ways to represent networks that don't rely on encoding a tree structure directly.
So, I've been doing what Richard Fabian suggested and I lay out my data (on paper) into tables, then attempt to normalize it and see the connections. I really like this way of designing things.
My big issue is that doing DB-like operations is hellish in most programming languages, and if you really want to try and marshal your data into a real DB (say, SQLite or DuckDB via a library), then you have a big messy translation layer where you're trying to match things to SQL types and you have giant SQL strings everywhere.
I see C# has LINQ, which is a query languages embedded in the language. I wonder if that approach is best, and why hasn't it been adopted more broadly? It seems like there's a lot for programming language designers to explore in this dimension, though I wonder if it even matters now with the superintelligence tidal wave.
1. https://www.dataorienteddesign.com/dodmain/
I prefer having that translation layer especially when it's domain oriented. All the sql strings are collected in one isolated module, and the only exported symbols is a set of functions.
From Domain-Driven Design, what I learned is to be comfortable having different representation of the same data in different layers/subdomains. Something may be a fat object from the API, but I prefer having a collection of functions that each use a different part and have a caching layer to not actually do the expensive network call. That network call and the caching layer will be encapsulated in one module and the collection of functions will be the only thing visible.
It's a reasonable take that changing the entire way that the database modeled everything under the hood is an overkill solution to the specific problem you mention compared to something like LINQ that can work on top of existing databases, but I can't help but wonder if there's a bit of inertia in how willing people are to challenge their usual ways of thinking about how data modeling might be possible to improve because a lot of people don't get exposed very much to anything other than the raw, string-like handling that you mention (which is annoying but at least SQL injections are a well-known thing nowadays and tend to be possible to avoid) or a full-blown ORM (which quite often ends up either being wildly inefficient or needing to drop back down into the raw SQL in some places to avoid the performance bottlenecks, which kinda defeats the entire point). A startup I worked at a few years ago actually had what I thought was a pretty clever solution to this problem, with their product generating OpenAPI/GraphQL APIs for a given database by inspecting the schema (with optional parameters to get back EXPLAIN data in the responses to verify that the query was what you wanted, and the ability to define custom routes with raw queries that were checked into shared version control with the schema migrations if you weren't happy with the query it generated as a way to properly separate concerns as an improvement over the traditional ORM workflow), but despite the idea seeming quite enticing to me from a technical standpoint, I guess it didn't show enough traction to be able to survive.
Do not put regex at the top of the file either! Put it where you use it. Languages are smart, they’ll probably be able to tell that it’s constant anyway.
Also for tiny functions just use a lambda. Please don’t make a one line function a million miles away that you use once or twice.
Anyway, maybe there are other reasons too, so see Chesterton’s Fence. In any case, it’s never a good idea to assume cargo culting. Someone could easily say the same thing about using inline literals. If it looks weird, ask around and maybe you’ll find out there are good reasons, or maybe you’ll find out nobody cared and that people will like it if you refactor and embed the constants.
If you pull it out into a constant, you're back to opening up projects one-by-one to 'find usages'
Being selfish is the core principle of microservice architecture.
For $19.95, you can replace your single single point of failure with multiple single points of failure!
So it just happens, right? There is no remedy to this? You know the answer :)
BTW I'm all for monolith.
Part of being a good engineer is finding the right balance.
I know engineers who would gladly duplicate code all over the code base to avoid creating a new abstraction.
I know engineers who create polymorphic abstractions for a single caller with a very obvious set of parameters.
So much of wisdom is in finding balance and not being dogmatic about rules.
Automated re-factoring means you can refactor duplicated code only as long as it is exactly duplicate.
Whereas the whole problem is that when somebody changes 3 out of 10 of the duplicate cases in a simple way that they are no longer exactly duplicate, and then somebody fixes a bug in one of the other 7/10 cases, they can update the bug across the 7 "duplicate" cases but they'll miss the 3 that aren't.
The problem with duplicate code is always when some of the instances get changed/fixed but not all of them. And that when somebody edits one instance, they often aren't even aware of all the other instances.
Abstractions are low-risk, because you know where the code is. If it's the wrong abstraction, you can fix that and know what you're fixing. Whereas with duplicated-yet-modified code, you've now lost the connections between them.
Duplications can often be cleaned up over time, bad abstractions can quickly become a bottleneck, that severely slow down everyone working on the project.
Overengineering, abstractions and premature optimisation are the 3 worst plagues of engineering.
At the same time I’m happy they exist because it means we’ll always have a job.
I've learned to tolerate a small amount of duplicate code for this reason. If the duplication remains small, it's not that harmful, and if it starts to grow, one has a better shot at finding a good abstraction for it. Bad abstraction is premature abstraction.
One thing I'm not sure this thread has mentioned yet is how LLMs alter the cost-benefit curve of this. They are much better at managing duplication than humans are, and much better at noticing inconsistencies - the sort of small bugs which duplication traditionally leads to. I don't know if this is enough to count as a different kind of good abstraction; I doubt it. It reminds me of a petroleum economist I once knew who had 200 duplicate spreadsheets analyzing different projects and who hired a junior analyst to keep them all consistent. An LLM would be like the junior analyst.
it wasn't received well and senior developer told me that 'good developers know exactly what patterns to use all the time before writing any piece of code and that he will clean up my mess'
long story short his refactoring caused what was otherwise a stable system into a complete mess and it reminded me of Nassim Taleb's book
Yeah that totally happened
At the very least it is not once you're working at the wrong kind of scale.
Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace.
And in the LLM era the wrong kind of scale appears in different ways; code generated and duplicated without proper abstraction and then maintained by an LLM that cannot be trusted to do the same modification each time it encounters a pattern or to have enough of an overview to slowly rescue duplicated code through good abstractions.
I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are past a de minimis threshold.
Code that is coincidentally similar very often diverges in either the short or long term, and DRYing it up aggressively tends to result in functions that have many boolean parameters that each trigger disjoint sets of behavior - which is a bit of a nightmare to maintain due to the high cognitive overhead of remembering how all the interleaved-but-actually-unrelated behaviors should work.
This outcome is low-cohesion code.
It's a useful concept to be aware of - worth clicking through to the actual content of the talk rather than just the headline.
I've seen this article and AFAIR the video before, and FWIW having been a Rails developer from the very early days and fitfully until maybe even 2014, I now interpret the phrase "my Railsconf talk…" quite negatively.
ETA: nice to be back to disagreeing with people on HN about coding principles again though. Hopefully this is a sign.
It would be iconoclastic if the common sense basic approach would be to start with abstraction. It's not, the common sense default is to write possibly duplicate behavior until you actually discover several cases to abstract away, until you bevalop a sensible idea of which functionality unites them and which doesn't carry over all of them.
>Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace
Maintaining the wrong abstraction, or, god help, abstractions, would be even worse.
Hard disagree. When you've had to chase through a change in untold and actually unknown numbers of duplications of code in different permutations and fix them because they are all on fire simultaneously, you'd disagree too. A bad abstraction would at least have had one fire in one place.
The other end of this spectrum is dealing with the architecture astronaut's up-front abstraction. Totally overengineered for solving the initial requirements, but then constantly needing new hacks to make it cope with new requirements as they come up in the normal course of work.
That's why there's a balance in there, it's somewhere between "always duplicate code even when you know a lot about the problem" and "always write abstractions even when you know very little about the problem."
Wouldn't most large codebases with poor abstractions just have engineers engineer around them with their own solutions? In a large enough codebase you'd have both the bad abstractions and all the not-quite-duplicate implementations ignoring the bad abstraction?
I'm using bad here loosely, it could be buggy, incorrect, incomplete, insufficient and more; while being owned by someone or some team that's a challenge to work with for various reasons (overloaded, under-resourced, overbearing, etc., etc.).
Obviously, yes. But it is my experience that this happens more slowly and that API invocations that break when the abstraction is changed are much easier to identify than broader duplicated patterns of code that span many lines and subtly diverge.
And even then those divergences are better because each wrapper around the abstraction is documenting the problem with it. But the abstraction can generally be replaced by one with the same API surface.
(Even if you take into account the fact that any API behaviour ultimately gets relied upon even if undocumented. Which is true.)
To be fair my experience is that of a freelancer and contractor who arrives trying to fix things that have been through many such hands. And I think if these developers had it drummed into their head that any attempt at abstraction would be better than copy and paste, these situations would be more knowable.
When that happens there's a major engineering leadership failure currently in progress, even if engineering leadership isn't aware of it.
EDIT: LLM or not, this is still true. If you have LLMs pumping out tons of duplicate code you're wasting tokens, and probably more importantly wasting engineer hours reviewing duplicate code.
In some cases it might be a fair trade, in moderation. In general it's certainly wrong.
Doesn't it mean that you are in a good place to start DRYing code? I mean, code was written in a way to avoid bad abstractions. You can't generalize on 1-2 samples, but now you have "unknown numbers" (more than two?), so you can start looking at it an see patterns. It means you can create a perfect abstraction. It is the basis of the WET (Write Everything Twice) principle.
It would be frustrating, and I mean really frustrating. People are easily generalize over two things but they struggle to generalize over three. Pick two random words and think of a common category they fall into. It is an easy task for 5 years old. Pick three random words and try to generalize them, you would have a very hard cognitive task.
This frustration stems from the inherent complexity of the task. It is not because people before you wrote duplicating code, it is because it is hard to generalize. People before you didn't do it being afraid of missing things and creating a bad abstraction, but you have hard data, you can create an abstraction without missing a thing.
That's true only for "good" abstractions. Bad abstractions will often require you to change code in all the places using it, requiring you to understand how all of them work and what are their requirements, _all at the same time_.
A uses the abstraction, but finds the API doesn't work. Fixes that.
That causes B to have to make a tracking change which induces a bug. B realizes that the API isn't quite right. Fixes it.
That causes A and C to make tracking changes. These induce more bugs. C fixes the abstraction to avoid these cases.
This breaks A and B so they decline to update.
And so on. This is what a bad abstraction looks like. API "fixes" bouncing around the code as they reflect off of the bad abstraction.
The security bugs were all in features I never wanted.
A bit of simple duplication would have been golden.
On the contrary: that's precisely what a bad abstraction would not offer.
Instead it would spread its assumptions to different parts of the system, as every caller, sub-service, etc. would have to change shape to fit in that abstraction's box, however unnatural it is (and we know it would be unnatural, because we already said it's a bad abstraction).
Abstraction is not the same as encapsulation.
But so does duplication, in practice, and it diverges more as it does.
Oh, it happens all the time.
But any abstraction ends up with a signature and a name that can quickly be found in code.
The risk of a long-lived duplication losing its shape and being hard to find is much greater. Especially if the code is going through multiple hands.
I once had to pick up a project — a working, fully functional website. I could see, pretty clearly, the work of several people. All but one of them terrible.
The one was a diligent developer who was fully wrong in their abstraction (in fact significantly) but was consistent in how they used it.
The rest had simply worked around that code, copied and re-copied their own modified duplications and let things lose any shape. The result was error-prone stuff.
Clearly either the budget (or the client's capriciousness — a separate issue and arguably the bigger one) scared away the one guy, who I actually wanted to talk to but could not track down. He possibly had the origin story, and I wanted to know why his particular abstraction, which was at odds with the framework, was there. It was good code in the wrong shape, and it clearly used to do more, and that is interesting.
All the expedient people who had decided to avoid his code and just patch in duplicated pieces around it were the problem. There was no form to their solution at all. And that had clearly happened over some time (because you could see several different code styles)
Abstractions are a form of coupling, and coupling can be good, if the components are truly interdependent, and have a well defined domain. The problem with most abstractions, and I’ve seen this time and time again, is that they become brittle, are over used, and the cost of maintaining them grows exponentially with the size of the code base. With the reason for the cost ballooning being the system has disparate components that look interrelated but are absolutely not. Once you give someone a hammer they tend to assume everything is a nail.
The biggest problem, IMHO, is that abstractions are often used where a pattern would be more effective, easier to maintain, and easier to iterate on. And the primary difference between a pattern and an abstraction really comes down to coupling. Patterns remain decoupled, abstractions are tightly coupled.
And to be clear, I will and do use abstractions, when and where they make sense. But only after clear patterns emerge, and it’s been proven that components are truly coupled.
I will gladly die on the hill, that abstractions are measurably worse than duplication an overwhelming amount of the time. They’re often nothing more than a form of premature optimization.
It all depends on the amount of duplication and the complexity of the abstraction. Like you said, no generic advice is possible that clearly separates it into "abstract here" and "duplicatehere".
In your example it sounds like we aren't talking about 2-3 places where duplicate code existed that just needed to be refactored into separate units. It sounds more like a complete disregard for abstraction to move on quickly.
If you see duplicate code and have a good understanding how to solve that then it's totally a good thing. The real problem comes in if you add abstractions without knowing wether they will hold up. And this is where the blogpost comes in. In my opinion 2 duplicates are fine, at 3 you should start thinking or implementing an abstraction if you have a good understanding of the code and usecases.
Exactly. The abstraction purists are not working in the messy, dead line driven real world.
Write everything twice quickly becomes write everything 4 times once a new change appears, just as quickly as it becomes write everything 8 times, and so on.
I'm afraid there's no sensible soundbite developers can follow blindly.
That's a good problem to have. Getting to 4 or 8 or 12, and then pruning it to 1 or maybe 2 or 3 clearly different cases, is better than shoehorning multiple cases into the wrong abstraction, having everything that speaks with them coupled to that and dancing around their assumptions, and then having to untangle that.
Duplicated code is by definition LESS coupled.
Having a lot of if/else in your code is definitely a cost. My weakness isn’t so much the libraries and APIs, but the actual binary - once I have a service that does A very well, and I run into needing A’ I mostly just add in a config line “op_mode = A|A’” and have the else/if chains in the server driving code. Moreso for CLIs that I use myself than production services, but I have added tunables for consistency and replication to datastores to allow new use cases and expand my footprint in the data center.
If you haven't figured out a good abstraction at 5-100 customers, God help you.
Half of your abstractions are wrong. The hard part is knowing which half.
This is tautological though, it's like saying “starving is much better than eating the wrong food” (for instance: eating quick lime).
Of course you'll always find a way to do things wrong in a way that is costlier than not doing anything.
But also it's very possible to not realise you needed an abstraction until it catches fire in multiple places.
And quite often it's not you that got the codebase to a hundred customers, is it? Sometimes it is a sequence of fresh-faced young developers who didn't have the authority to say "this duplication is bullshit" and were instead compelled to repeat it.
I think a lot of these discussions happen in nice little blog-post vacuums of progressive thinking, where people can go "mmm, object oriented coding obscures intent and clarity, mmm", blog posts with "an X is a Y", "the unreasonable effectiveness of foobar" etc.
In the real world, every duplication that works sticks for good; there is rarely budget to electively replace code that isn't broken. Until one day it doesn't work. And then… how many times is it actually duplicated? How many of the duplicates diverged? How many of these do we no longer need?
So... the wrong abstraction, no matter how bad, is better than code duplication?
> I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are past a de minimis threshold.
I appear to be in a solid minority thinking this. But I'm OK with it. I'm probably not going to write a blog post.
This blend of opinion is very naive. Every single project is a business requirement away from having the wrong abstraction in place.
https://xkcd.com/1425/
https://www.youtube.com/watch?v=VG0btgXY_D0
Of course it's a truism if you just say any abstraction that works is a good abstraction.
That is not what I am saying at all. Bullshit abstractions at least let you control the problem. Duplication doesn't.
I agree with you that it’s a truism, but it’s useful advice for people who have a habit of trying too hard to DRY their code. IIRC the author comes from the Ruby world, where DRY was a big thing, and this talk was part of the pendulum swinging back away from this DRY obsession that sometimes just resulted in convoluted code.
I agree that LLMs are naturally anti abstraction machines.. I'm often trying to find way to reverse that.
I am a bit of an LLM cynic but I am trying to learn it all, and I have to say I have spent most time trying to work out: how do you explain how a brown-field codebase actually works, in such a way that the LLM won't pervert it through misunderstanding.
It does encourage you towards the "conventional" coding standard for any new project, because you want to use a pattern that it will have seen in its training set.
But for example there are differences of opinion in how wordpress plugins (which have a very complex control flow) should be structured. LLMs are incredible at knowing how WP works, actually, but what is difficult is explaining how your methodology for a large plugin is going to work.
It is a battle — but a useful one because it can be used for, er, studying the comparative belief systems of the LLMs.
But if I tell it "read these files that use the same conventions" first, there's no misunderstanding, and the agent also picks up the general "tone" of the code. I have very little to tweak if I've defined the problem well.
Oh that is a bloomin' great idea, and I can fully see how it might work better.
Can't tell you how valuable this comment has been to me and now I feel so much better about evidently kicking a hornet's nest ;-) Thank you so much.
If you're using a coding agent like codex or claude code, I've also seen marked improvement by telling the agent to keep a journal of decision points, and every file read or written. And then, here's the important part, read the last five journals before starting. It primes the context with whatever you were working on and keeps a new session more focused that if it has to go searching for keywords through the whole codebase. It can also be an interesting read.
A story I like is that in the now lost era of handwriting recognition on PDAs, Jef Raskin concluded that the easiest way to solve the problem was to change handwriting so as to meet the algorithm in the middle.
That is, to find a noticeable simplification of handwriting that people could learn quickly and that eliminated hard-to-process quirks.
I feel I am there with the LLM at the moment, trying to work out what the common ground is.
It really depends on the exact type of code we're working with, and what our objectives are.
In my case, I often use object inheritance. It's a damn cheap way to DRY. However, when people hear "inheritance," they often think "polymorphism." There's a really big difference between the two, but popular culture has jammed them into one ball, and it's not worth the agita, to try to explain the difference.
But if you are doing optimization, long stacks can be your enemy, and inheritance tends to have long, windy stacks.
In these cases, the copy/pasta method may well be the best approach.
Like I said, "It Depends."
I agree that we should think of inheritance and polymorphism separately. If we want to express this intent in object-oriented code, how can we use inheritance to deduplicate code, while preventing misuse of the resulting object hierarchy i.e. the use of base classes in a polymorphic context?
In C++, IIRC private inheritance would do the trick (you cannot static_cast DerivedWidget * to BaseWidget * if DerivedWidget : private BaseWidget), but most OO languages don't support private inheritance. It's also not possible, as far as I know, to "lock down" BaseWidget * so it cannot be used as a base class pointer from any derived class: instead, you have to apply the private inheritance to every derived class to enforce this rule.
Another approach is to use has-a instead of is-a: i.e. instead store a BaseWidget object as a member of DerivedWidget. This allows for re-use without supporting polymorphism.
This is especially true, with languages like C++. Someone (I have heard it attributed to Bjarne, but I don't think he said it) said "With C, you can shoot yourself in the foot. With C++, you can blow your whole leg off."
But there's stuff that can basically, only be done in C++. It's a very powerful, mature, and storied tool; meant to be used by competent grownups.
In tech, we have folks that seem to be absolutely convinced that we can have tools, so marvelous, that we can hire total incompetents, and that they will magically write good code. I know of no other engineering discipline, or craft, where people think like this. They usually have rigorous career ladders, with lots of gates.
Maybe Finance sometimes lets knuckleheads behind the wheel, but then, you get things like the Barings Bank disaster.
"What's Barings Bank?" you ask. "It doesn't exist! Is it a hallucination?"
No, it is not. Unfortunately, they let a rather junior trader, named Nick Leeson, behind the wheel...
It's possible that LLMs may finally give us something like what people want, but I suspect that we'll be seeing folks stumping around on one leg...
Everyone always thinks duplication is fine when you can bill the modifications by the hour. But they never think to understand that the reason they've had so many employees is that they've turned their change process into firefighting all the different versions of the same code and all these young developers burn out from the sheer anxiety of not knowing where all the little fires are.
I once had to rescue a site that had become a victim of its own popularity, that was written by subcontractors who clearly believed that duplication is better than the wrong abstraction.
Until one day, along came a change — MySQL 4 to MySQL 5 — and a significant duplicated query no longer worked due to its new, proper strictness.
The problem was compounded; not only was the broken pattern in hundreds of places where it had sat, stable and predictable, but the pattern was broken because it, itself, was avoidance of another abstraction that would solve it.
They quit: they said they couldn't and wouldn't fix it. It had always worked how they had done it, and it would have to stay on MySQL 4 (which the hosting provider refused to accommodate).
I don't think it helped that they were severely misguided in their understanding of SQL, but the code had become beholden to duplication and then crippled by a new problem in the duplicated pattern.
I had to first find all the contexts in which that pattern appeared (which required me to spend half a day on a bespoke script) and then work out a new pattern and as few variations of it as possible to fix the duplicated code in each place, because there was no proper budget to rewrite the whole thing. And then I sat at my desk, for days, working through each one, figuring out how to change it to fit the slightly different expression of the pattern.
Even a total bullshit abstraction would have saved that client both time and money. And this is only one of dozens of times I've seen small firms simply duplicate and change code that would later become unmaintainable because of a straw breaking a camel's back.
I would be curious if the previous coders you're talking about actually cited duplication as a good thing. You seem to be implying they are. But almost every instance I've seen of massive code duplication was just from bad programmers shooting from the hip, not from some ideological stance.
Right. But this is a hypothetical, in-a-vacuum situation.
In the real world, your two, three duplicates are in production.
"We really should now de-duplicate this"
"There is not the time or budget, just copy it again; we'll replace all this one day".
ideal case: support libraries and then very simple duplicated code that is easy to read and modify. critically the core control flow should remain duplicated, but simplified by the support libraries.
Pretty much everyone arguing for duplication has argued what you are saying, which is wait to see a few instances of it before committing to an abstraction. No one is saying duplicate everything 100 times. So I don't think this discussion was ever iconoclastic.
In the real world, duplication happens in an emergent way, there isn't the time each time to judge whether it's really time to just quietly abstract that code, you may not get the permission, budget or window to do it, and if you don't stop the rot really early you are locked into the pattern.
The context a decision is evaluated is particularly important for "rules of thumb" like this. There's the rule of 3 (which many senior engineers imparted to me earlier on in my career) - don't refactor until you've actually duplicated it thrice, but even so, what they speak of is a catch-22 that's pretty important to reason about carefully.
On one hand, if you overcorrected on the fear of abstraction, you could easily end up with 500 duplicates that are slightly different and need to be maintained 500 different ways, slowly causing slightly wrong behavior some of the time, data corruption, combinatoric explosion. Surely, once there is such a situation, some degree of abstraction is the only right decision.
On the other hand, if you overcorrected on the fear of duplication early on, you could easily end up with a premature optimization and complexity -- complexity which, most importantly, could be rooted in a gap of understanding of how the code will be used and what direction it may go in over time (often based on which direction the business will go over time).
The only answer that actually works, of course, is "somewhere in the middle." Obviously, that's pretty vague and not very useful. Where, exactly, in the middle IS the right place?
As the years have gone by, I've become more and more steadfast that the answer to that question is and must be an art and not a science. Of course, it must always be rooted in practicality, the actual context of the code around it and where the code/business was in the past and where it will be in the future.
But just as importantly, some of it must be based around beliefs in the face of imperfect information about what you want to invest in for the sake of the technology, the team that develops it, and the business that relies on it. It could be that for your team, your values make it make sense to go a little bit further than "good enough" on normalizing your data modeling, because the way you like to run your business requires that normal form to do the analytics and make decisions productively. It could be that for your team, your values make it make sense to go a little bit further than "good enough" on splitting service boundaries and ensuring clean queues and message passing infrastructure, because you have seasonal spikes where you need to scale up to a ton of load and then scale down after without constantly doing a song and dance or pre-provisioning fragile infrastructure.
But the most common thread there is - art, not a science. Every single decision depends on YOUR team, YOUR business, YOUR needs - and like any art, there is no universal rule or discovery or best practice in the industry that will magically work for your needs without working through the details of whether it appropriately fits your situation or not.
So with that said - I can't really agree with you. At any place I've ever worked with a competent team, maintaining duplicate code is just not that hard and follows the same process for being dealt with. Built a robust test suite that encodes the actual differences and the shared structure. Pull out the pieces that have a good reason to be abstracted and redesign the pieces that encode the true differential structure in a way that is intuitive. Lather rinse repeat. It's always straightforward because it's known - by the time you are doing this process, you've had tons of repetitions and data on what is driving you to develop the abstraction, so when you make the decision, you are making it empirically.
Conversely, I have seen many otherwise competent teams slowed to a halt with premature abstraction. Frameworks that were well intended and reduced duplication, but encoded coupling between components that at a certain point in the businesses progression, fought with reality rather than aided, and all because they were frozen into place before anyone empirically had really clear data about whether the abstraction would be worth it long term. Well intended "clean code" refactors that were meant to solve the old "bad duplication" but instead created a far more difficult to reason about "abstracted base" of code that didn't really solve any of the domain modeling problems and was just as difficult to maintain without introducing buggy behaviors (if not more so) than before.
The biggest problem is that premature abstraction is sexy and fun. There are incentives and dopamine hits from doing it extraneously. But fixing legacy duplication is not fun. And so when it gets done, it tends to get done in a pragmatic way to relieve pain rather than to elicit pleasure. That, I believe is one of the biggest confounding sociological aspects of this whole discussion.
Starting with abstraction when you are only beginning something rarely works well and leads to code bases littered with interfaces having only one implementation.
Abstracting the code when you have two copies does not always pay off, especially when you end up not needeing more than just two copies anyway.
But once you have three copies, it's indeed time to start generalizing.
But in one of the scenarios I mention earlier, I earned a chunk of money once fixing an issue that emerged in a subcontractor's four or five line duplication that had ended up rippled through a long-lived codebase. A ground truth (MySQL version) changed, and the pattern broke everywhere, including places where it had evolved.
So I tend towards thinking, yes, any three-line pattern that is likely to appear everywhere should, perhaps, be centralised.
It's certainly worthy of serious consideration. Usually pretty easy to maintain the surface of such an abstraction.
I read “Practical Object-Oriented Design (POODR)” ages ago at this point, but it reshaped how I approached OOP
Granted… OOP as a default paradigm has fallen out of favor (at least for me), but it’s still everywhere & won’t be going away. She gives a great framework for making it sane
To help with that, I think the simple model of two callers depending on a common code needs to be scrutinized. If the common code needs to change because only one of the caller needs it, then it doesn’t belong in the common.
The wrong goal for DRY is attempting to do it with encapsulation. Encapsulation shifts the refactoring work from the caller to the common code. However this is not what you want because there’s a lot more consequence in updating the common code than the caller.
You can avoid encapsulation and still be DRY by having multiple thin abstractions that the caller needs to be aware about is better. In OOP you are taught SRP and IoC for this. In procedural programming, this just comes naturally as code calling series of helper functions.
There has been growth since but it's been concentrated into fewer channels and somewhat industrialized.
To me, the question is: can you look at this abstraction and understand why it exists, without knowing who’s calling it? If so, it’s probably fine.
If an abstraction only makes sense because of the particular weird details of these 3 callers that have to pass mutually exclusive arguments to it to get their desired behavior, it’s probably wrong. An abstraction needs “a place to live” in your architecture. It needs to be self-evident in justifying its existence.
If you find yourself repeating code, but de-duping it would create these sort of weird non-self-justifying abstractions, your architecture is probably a bad fit for the problem you’re trying to solve. Maybe that’s because the problem changed since the software started (which is a bit of a pickle: do you re-architect, or do you continue writing weird inscrutable code?) or maybe it’s because you just picked the wrong abstraction in the first place. But you should recognize it: duplicating vs wrong-abstraction is about choosing the lesser evil. If the abstraction was a natural fit for the problem, you wouldn’t need to answer this question in the first place.
The Wrong Abstraction (2016) - https://news.ycombinator.com/item?id=35927149 - May 2023 (69 comments)
The Wrong Abstraction (2016) - https://news.ycombinator.com/item?id=27095503 - May 2021 (17 comments)
The Wrong Abstraction (2016) - https://news.ycombinator.com/item?id=23739596 - July 2020 (240 comments)
The Wrong Abstraction (2016) - https://news.ycombinator.com/item?id=17578714 - July 2018 (207 comments)
Prefer duplication over the wrong abstraction - https://news.ycombinator.com/item?id=12061453 - July 2016 (96 comments)
The Wrong Abstraction - https://news.ycombinator.com/item?id=11032296 - Feb 2016 (119 comments)
The Node ecosystem is full of wrong abstractions.
Even better is an old school MPA with progressive enhancement.
Not that I'm immune from choosing the wrong abstraction sometimes. More than once the "other people" was me. We all make mistakes.
Yes, okay. But with both you will have a bad time cleaning up.
There is a third option: good abstractions.
I did see this pattern described in the blog in practice a lot (and fell victim to it myself) and I think that in general this comes down to inexperienced programmers. Object oriented programming makes it worse.
Teaching these programmers that they should not abstract is not the solution. It is blocking their growth.
Teach them how to make better interfaces instead.
Code duplication and 'wrong' abstractions both count themselves amongst the other foibles of programming. But they don't directly produce a cost which can be cheap or expensive.
They produce some other high dimensional intermediate value which can then produce highly variable cost dependent on the domain, goals, and scenario.
As ever, it depends.
The depends is quantifiable, but it doesn't fit in a blog post. Think more along the lines of war and peace.
Copy and paste once is fine, twice, not so much.
Often I've seen two totally different things exist in one bit of code, no overlap!
Premature generification is bad, and leads the developer to believe that two things are the same, making it harder to see they are not.
Also, can make it much harder to see that a different abstraction would give a cleaner outcome....
If you have code that is reusable, you'll want it to have a nice interface. But, you don't need an abstraction on top of a nice interface. Just use it.
For abstraction, what you need to focus on is "What is most likely to change in the future?" You want to put in abstractions that will make those changes low-cost.
Ex: At work there was a small debate about which C++ JSON parser to use with no stand-out winner for our framework's needs. So, we picked one and I put a thin layer over it for everyone to use. We have since then swapped out the parser and swapped it back over the years of a hundred devs using it in our framework and no one noticed the swaps.
Unsurprisingly, that goes for just about any idea in software development. I worked in one code base that heard small functions are best, so every function was less than three lines long. You don’t gain anything by replacing `lst.get(0)` with `get_first_item_in_list(lst)` (in fact, understanding becomes much more difficult), but breaking down functions into the smallest units that make sense independently within the business domain can be very helpful, both for understanding and testing.
It's of course possible to functional-ize segments of logic, but then the question of state mutation must be brought up. How isolated are these changes from other parts of the code / system state. Can this be run in parallel or is it something that must be serial? What potential race conditions exist?
Things that should be tightly coupled but are not is preferable to things that should not be tightly coupled but are.
I agree especially because coupling things is easier than uncoupling things.
I think DRY is the first heuristic for “good code” that most junior devs actually grasp, and so they become very dogmatic about it for a while
When I started writing code 40 years ago, I used to over estimate my understanding and abstraction skills a lot. As a result I created overly complicated and difficult to maintain/evolve solutions.
Turns out I need to see more examples of patterns before making good choices, which means becoming comfortable with seeing and tracking duplication over time.
On the other hand it is pretty difficult and error prone to consolidate duplicated code which have drifted apart over time.
If in doubt, chose the approach which is simplest and least risk to revert if you discover in the future you made the wrong choice.
I do agree a bad abstraction can cause huge problems. But it’s usually not the kind of abstractions introduced to eliminate code duplication, but the kind of top-down “architecture astronaut” abstractions, where a model is chosen which does not fit the complexity of the problem.
Having a wrong abstraction means you end up with a class/function/module with a huge amount of configurations through boolean/enum parameters. It's not even clear that all combinations of configurations is even valid. This situation may be simplified by duplicating, and then eliminating code, thus creating more streamlined code for each use case. This may require fixing similar or cross-cutting bugs in multiple places (eg: JSON serialization is stupid, need to hack a workaround), but keeps the business logic changes simple. Maybe a bit more numerous, but the code is able to raise all the scenarios to consider.
Having no abstraction means you may have to change business logic consistently in multiple places, or you have to fix exactly the same misconception (aka a bug) in multiple cases. e.g. tax rate management in a multi-national context. This is also terrible, because you may fix an important problem in one place and forget other places with the same issue. Now you missed 12 potential bugs by fixing one. This can however allow you to discover a true abstraction. Maybe these 12 places should call just one place?
But for code evolving across a team understanding this tension, a bit of duplication while waiting for confirmation that these pieces of code break together and change together is better than just shoving the same 3 if-statements into a function to avoid "line duplication". Concept duplication is more important.
You hardly ever change the thing and if you do, changing it in two or three places 'manually' is really not a big deal.
Now changing something fairly often, that affects logic in 50+ places? Then it makes sense to automate with an abstraction so it all flows through the same lines of code.
I know I've personally spent way more time over the years debugging bad abstractions than changing things in a few places.
This step should also be parameterized by how many times the duplication has occurred. Refactoring preemptively may lead to poor abstractions, but not refactoring after seeing the exact same thing tens of times would also be weird. See also:
https://wiki.c2.com/?DuplicationRefactoringThreshold
https://wiki.c2.com/?ThreeStrikesAndYouRefactor
I've worked in too many projects, where every new feature needs to be built on top of existing abstractions, that often lead to severe restrictions if something slightly different is required. I always try to create reusable units/components, that can either be used as intended or replaced by something that behaves slightly different if needed.
Components are not necessarily frontend components, this extends also to backend logic.
Refactoring code to reduce the number of lines is _compression_, akin to RLE coding.
Refactoring the code to lift conceptually coherent parts is _abstraction_.
Less compression, more abstraction. Then you're fine.
IMO it's easier to inline a bad abstraction than it is to consolidate a bunch of subtly different things that should have been abstracted from the beginning.
But I expect people's opinions on this differ wildly based on their personal experiences. Just my anecdotal take.
Some of the biggest rabbit holes come from naming conventions not aligning across the business and technology silos. If everyone agrees that Customer has exactly 34 attributes, then it is possible to move to the next step of sharing libraries of types across the team. Getting your POCOs/DTOs 1:1 across the board is when the duplication really starts to melt away.
[1]: https://max.engineer/cms-trap
Of course, the worst abstractions are the ones you don't need at all.
Also I’ve seen the kind of codebase that seems to be LZW packed due to the sheer desire to DRY everything out. Not pleasant thing, by the time you goto 10 layers deep on some “helper” function you forgot why you in there.
OP is right that code duplication is far cheaper than the wrong abstraction, but the opposite is also true - the right abstraction is far cheaper than code duplication.
Perhaps "recompilation" - rewrite by replaying all prompts in strict code quality context (linters, complexity & dedup checks) would make better abstractions.
The only problem now is that LLMs are non-deterministic.
Compiled assembly code is not an input to the next compilation; source code is an input to the LLM's next inference.
Sure, maybe "prompts are the code," but you must realize that code is also the prompt.
Do you want to iterate using for loop or using .iter().step(2).map()?
I would rather have consistency than a mixed bag of levels of abstractions.
This isn't really a good example, assuming both can be used to represent the same thing.
The problem with the wrong abstraction is when your abstraction doesn't let you represent something. Then, because of you've already invested so heavily into it, you start contorting the problem to fit your abstraction and it becomes a shit show.
I don’t think it matters, specially for sort sized loop scopes
Otherwise what is better is better and we don't know what we don't know
But beyond that, any stable abstraction is better than duplicated code.
You cannot find the edges of the system with structure you don't understand because once the abstraction are set in place solutions often have the same shape as the frameworks which leads to ultimately really bad systems.
The best way is often not the obvious way. Once you reach the edges then you can think how to program the abstraction but that is many versions down the line from the original.
With AI, we really need to rethink the clean code principles.
Some previous discussions:
2023 https://news.ycombinator.com/item?id=35927149
2021 https://news.ycombinator.com/item?id=27095503
2020 https://news.ycombinator.com/item?id=23739596
2018 https://news.ycombinator.com/item?id=17578714
2016 https://news.ycombinator.com/item?id=11032296
Generalizing this in the abstract is a wrong abstraction.
Usually, some moron decided to copy paste things a few levels up and then the top half of the system metastasized into two parallel universes of broken garbage.
For instance, one might decide to perform auth later in the flow so unauthorized handlers can run and set a “this requires auth” bit that defaults to false, and the other flow could add a forged auth header before the auth step.
Now, the auth handler needs a “allow forged header” flag and a “already authenticated” flag.
I’ve seen that grow to a half dozen cases until massive production dataloss occurred. A buggy client tried to delete something local to their account without specifying a userid as a parameter (this codebase was garbage!) and deleted the something for all users instead.
I can’t remember how the dataloss was “fixed”, but it definitely wasn’t “all requests go through a simple auth check, and all handlers declare/implement their auth requirements in the same way”.
Getting a design approved to require a user id be specified exactly once for account-level operations was fantasy land for that team. (Most hires with any sort of engineering talent bounced in under a year.)
Anyway the “abstractions are hard so copy paste” approach did provide job security for the lifers on that product. I can’t imagine them holding a job elsewhere, but they were completely immune to layoffs (hostage style).
This is a pretty valid approach if you’re an agent hired to perform industrial sabotage, or if you keep replacing keyboards after you knaw through the corner.
Very true in some sense, but I continue to encourage DRY-bias because I've literally never seen teams duplicate code responsibly and later dedupe it when it's the right time. 95% of the time this sentiment is quoted to justify shipping quick slop and stable reusable bits are never extracted into a shared lib later.
Maybe this is an area where AI can help identify duplicate code though to show opportunities for de-duping
https://caseymuratori.com/blog_0015
Previously discussed:
https://news.ycombinator.com/item?id=17090319
https://news.ycombinator.com/item?id=36455794
https://news.ycombinator.com/item?id=46183091
But if I have an interface and three subclasses with duplicated or almost duplicated code, this is quite easy to find.
That’s much nicer than an abstract base class where only some children override the methods, because now I need to check which ones actually do.
Combined with another interesting idea "whatever I dislike is wrong", it makes me _always be right_, which is awesome. I can never lose a discussion about abstraction with this powerful combo.
The urge to split the code up since they beginning is generally a bad idea; it forces early abstraction; more likely to be wrong.
Pro tip - which is the least bad abstraction? Answer: it depends!
Abstraction is a vague term when used here. Is a shared function an “abstraction”? It’s more like implementation hiding, maybe some data hiding. But you definitely have a dependency on it now.
Acronyms like DRY are for beginners. Once you get good you know when to break the “rules” (and when not to).