Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Looks horrible to me. Why not a spreadsheet?


Without bothering to try to enumerate minutiae, here's the One Thing that would control such a decision in any such "why not just" argument:

Complicated formats can't be meaningfully version controlled.


OTOH

Excel guy: Boom got it done in 5 minutes sips coffee what's next?

10x engineer: I have devised a new programming paradigm to tell me I need to spend less on coffee, wrote a blog post about it and missed a call from a client and a recruiter.


I once made over $1 million developing a program to rescue a small financial services company from its unmanageable mess of Excel sheets.

While I was working on that, the company was busy trying to figure out which other companies the millions of unaccounted excess dollars in their fiduciary account belonged to.

Turns out the boom-got-it-done-in-5-minutes types don't always have a plan for making the big picture work, beyond their current 5-minute problem.


Sounds like business as usual to me.

Non computer people try to solve a problem internally with existing tooling, but the scale gets out of control and they hire a specialist, whether accounting or computer scientist. Most of the jobs I have consist of a company already hacking together a solution from canned software or having a prototype of what they wanted, don't think this is weird at all.

Are you suggesting they should have built a well architected Java program from the start? I'm not so sure.

1 - It's easy to say with monday's newspaper, companies can move fast and then worry about the consequences later if they grow and are succesful. Internal proyects or the whole company can be unsuccessful so you can tag on tech debt and worry about it if it needs to survive, happens all the time. As a curious note, the legal system supports this, companies have limited liability so courts are ok with letting companies that go too deep into the red (tech debt or cash) collapse into 0.

2- Overengineering is also a risk, although usually the failure mode is it never takes off the ground, but what's to say the couldn't have hired a -10x engineer to build their accounts system and then needed to hire an actual 10x engineer to fix the mess?

3- This seems like an accounting problem domain, I think if a company has classically trained accountants you avoid this problem, and if you don't then you get this problem regardless of whether using excel or a programming language. Double entry bookkeeping has been around for centuries. (I'm guessing an accountant helped build the specs for whatever you built for 1M$). Good accountants can handle huge (+1M) companies with pen and paper, and regularly do so with Excel, so I'm not buying that excel was the culprit here.

That said 1M$ is a lot, congrats!


This isn't some spaghetti ERP on Excel or some other blatant overuse of the application. This is tagging some lines on a CSV file to categorize expenses. A few filters, a few formulas, create a summary sheet with SUMIFS or just create a pivot table, done.

Evaluating the problem tells you how complex the solution needs to be. This problem of categorizing basic financial transactions is pretty basic and I'd take the basic approach every time.


Well in this particular case - yeah, excel 100% of the time.

But I think the point here was exploring different paradigms and their trade-offs. I appreciate that a lot, as I think we're stuck in various local optima as an industry - and spreadsheets seem to be one of them (once they pass a certain complexity threshold).

I'm not sold on the "everything in clojure" model here, I think you could accomplish all this does with a script written in $LANG and a little DuckDB (and probably a half dozen other similarly suitable approaches), but again - I appreciate the goal of exploring the solution space, especially when it's using approaches that I'm not that familiar with.


Fair point, and I've certainly been on both sides of this equation!

The Excel approach is absolutely more efficient for one-off tasks or where the goal is simply "get numbers, make decision, move on." No argument there.

Where the programmatic approach shines is when:

1. The same task repeats annually/monthly (like the author's accounting example) 2. The rules or categorizations evolve over time 3. You need an audit trail of how you arrived at conclusions 4. The analysis grows more complex over time

I've seen plenty of Excel wizards whose spreadsheets eventually become their own form of programming - complete with complex macros, VBA, and data models that only they understand. At that point, they've just created an ad-hoc program with a different syntax.

There's a sweet spot for each approach. Sometimes the 5-minute Excel solution is exactly right. Other times, spending an hour on a reusable script saves you 10 hours next year. The real 10x move is knowing which tool fits which situation.

And yes, sometimes we programmers do overthink simple problems because playing with new approaches is fun. I'll cop to that!


But then, who'd invent excel? Maybe we need the compromise and let some people be the 10xcel engineer...


For sure, I just don't see it coming out of a place of "I need to analyze my credit card statement"

Excel wasn't built out of a small personal project, it was a huge multi engineer project built on microsoft (built upon decades of spreadsheet software tradition and arguably centuries of accounting tradition), by a company that had a huge userbase that acted as stakeholders for managing the data and accounts of multi million dollar companies and small businesses alike.

I think that complex contraptions arise more from complex necessities than overengineering a solution from a simple problem


The VisiCalc/Lotus erasure is very strong in this comment.


Wouldn't they fit within the "decades of spreadsheet software tradition"?


> Excel wasn't built out of a small personal project,

But another program with a similar impact -- dBase, the precursor to Microsoft Access, was. Cecil Ratliff wrote it to manage football statistics so he could win his office football pool.


> Complicated formats can't be meaningfully version controlled.

Well, at the risk of simply saying "why not" - why not? Preferably with an example.


How can you diff the formulas in two spreadsheet templates?


XLSX files are zipped XML.

Unzip them, strip the formulae out of the XML, and diff them?


How do you "strip the formulae out of the XML" then? You sort of hide a lot of complexity there. The Excel file format isn't complicated because it's stored as a zip file, it's complicated because it's complicated.

Again, the use case is straightforward. You have an already-solved accounting problem. But, say, last years accounting solution is giving different answers than the one you just ran. So what changed? With the script in the linked article, the answer is trivially findable via the git history (and in larger software searchable via tools like "git bisect", etc...). And the reason for this is that source code is intended to be read by human beings, often including things like "comments" and "style guidelines" and "literate programming" to help the process. None of that exists for ad hoc GUI tools, and the result is that you can't meaningfully develop them as software.

That kind of tasks maps very poorly to "strip the formulae out of the XML, and diff them".


That's a mixed bag of claims all jumbled together.

> How do you "strip the formulae out of the XML" then?

Use any XML parser, or any pre-existing Excel library which can do that, e.g. the Import-Excel PowerShell module mentioned earlier:

    PS C:\test> $x = Open-ExcelPackage book1.xlsx
    PS C:\test> $x.Workbook.Worksheets[1].Cells["B3"].Formula
    SUM(B1:B2)
> "trivially findable via the git history (and in larger software searchable via tools like "git bisect", etc...)".

So put a pre-commit hook which dumps the spreadsheet as some kind of text file instead of a binary blob.

> "And the reason for this is that source code is intended to be read by human beings"

That's not "the reason" that's an unrelated popular saying. If it were true, reverse engineering, maintenance, refactoring, and porting between different languages would be easy. It isn't. Instead source code appears to be intended to be read by the compiler/interpreter the programmer is using, and if anyone else can make anything out of it, good luck to them.

> "often including things like "comments" and "style guidelines" and "literate programming" to help the process".

If you allow helpers external to the source code as part of the development process, that's good because it cuts off your incoming reply saying that a pre-commit hook writing the formulae from spreadsheet to text is too hard/too much work/unreasonable.

> "None of that exists for ad hoc GUI tools"

Ad-hoc gui tools aren't programmable. Notepad isn't programmable. Calculator isn't programmable. Excel isn't an ad-hoc tool, Excel is one of the most famous, most used, GUI tools on the planet with some of the largest ecosystem and community around it, and one of the most pluggable, scriptable, documented, standardised, systems going.

> "and the result is that you can't meaningfully develop them as software".

Humans wrote Microsoft Excel itself, wrote Windows search indexers for searching inside Excel documents, wrote SharePoint which can index and work with Excel content, wrote the Microsoft Graph API and M365 cloud which can integrate with Excel spreadsheets, wrote the OpenOffice/LibreOffice Excel importers/exporters, wrote the ImportExcel module and the DLLs it's based on, rewrote Excel in TypeScript for Office365. Claiming that humans can't meaningfully write code to work with ... weird formats? Excel? spreadsheets? Files that were once touched by a GUI? ... is such a throwing-hands-up-and-giving-up-without-trying take on things. People could if they wanted to. People running $16Bn departments with Excel sheets could if they took it seriously and invested an appropriate amount of money in doing so.

It just occurred to me to google "diff excel spreadsheet" and look, there's one built-in: https://support.microsoft.com/en-us/office/compare-two-versi...

And there's a SaaS product which does it: https://www.diffchecker.com/excel-compare/

And another SaaS product: https://xlcompare.com/

And another SaaS product: https://www.textcompare.org/excel/

and more I didn't click on.


And what happens if the author moved the forumla that used to be in B3? How does your diff utility detect that?

I genuinely can't believe you're staking your argument on the idea that you can somehow track code deltas between versions of a Microsoft Excel spreadsheet when literally no one in the world does this.


B3 would be empty. This is the same as someone renaming any function or method and you asking "but how would the diff utility detect the rename?". The diff would show that the method used to exist and now doesn't. And it would show that the new method used not to exist and now does. The Excel-diff would show that B3 used to have a value and now doesn't. That B4 used to be blank and now has a value. It could show it in a rendering of the spreadsheet, even.

Your argument is that it's not possible. My argument is that it is possible. Nobody is arguing the strawmen positions that you claim I am arguing [it's a good idea, it's a way to build reliable software, everyone does it, etc.]


I version control them by duplicating sheets


Nor can a single file.

If you're keeping records as files, I think you should just use a million files instead.

1. Easy to version control

2. You can sync on the cloud with less conflicts if each record is isolated from the rest

3. Sounds unique and cool


Single files can easily be versioned. Consider git, which uses blob based tracking, not file based tracking. It doesn't matter if you split file A into A and B, git tracks it just fine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: