Hacker Newsnew | past | comments | ask | show | jobs | submit | icsa's commentslogin

What is the quality of software that gets shipped? What is the rate of defects and security issues?

What are the support costs once the software is shipped?


Sample output for a significant github repository?


Here's Express.js (the npm package, 141 files). Scanned in 551ms on a laptop:

  - 1,953 components extracted
  - 17,505 typed dependencies mapped (not just "A calls B" — ownership, injection, weak ref, circular, etc.)
  - 25 architectural blocks detected automatically
  - 498 architectural smells found
  - 116 dead code detections
  - 100% classification consensus (zero ambiguous)

  Component distribution:
    Core logic:    425 (21.8%) — app, router, route objects
    Terminals:     744 (38.1%) — constants, test assertions
    Helpers:       346 (17.7%) — utility functions
    State stores:  313 (16.0%) — express, request, Router, factories
    Features:      110 (5.6%)  — test-specific app instances
    Middleware:     10 (0.5%)  — andRestrictTo, sendfile
    Entry points:    5 (0.3%)  — users, restrict, getCookie

  Architectural problems detected:
    CRITICAL: 532 components in circular dependency chains
    ERROR: God Class — `app` has 67 outbound dependencies
    ERROR: God Class — `router` has 83 outbound dependencies
    Stateful services at 0% health (critical coupling)

  Cross-cutting concerns found automatically:
    trust proxy logic in request.js (8 components)
    response callback chain: onend, onaborted, onerror, onfinish
    pure functions — 71 components, 93% health
    boundary validators — 98% health

  All of this is known to be true by anyone who's worked on Express.
  The God Object pattern in `app` is a documented community concern.
  The circular deps between app↔router↔request↔response are well-known.

  551ms. No LLM. No cloud. Deterministic — same input, same output, every time.

  Happy to run it against any public repo if you want to suggest one.


> With this design, it’s possible to run native SQL selects on tables with hundreds of thousands to millions of columns, with predictable (sub-second) latency when accessing a subset of columns.

What is the design?


In a few words: table data is stored on hundreds of MariaDB servers. Each table is user designed hash key columns(1->32) to manage automatic partitioning. Wide tables are split in chunks. 1 chunk = the hash key + columns = one MariaDB server. The data dictionary is stored on mirrored dedicated MariaDB servers. The engine in itself uses a massive fork policy. In my lab, the k1000 table is stored on 500 chunks. I used a small trick : where I say 1 MariaDB server you can use one database in a MariaDB server. So I have only 20 VmWare Linux servers with 25 database each containing 25 databases.


How well does this approach work with C++ source code - which is notoriously difficult to parse, given context-dependent semantics?


Turned this into a science experiment and designed a test/workflow to rename the symbol MatrixXd -> MatrixPd in eigen and the results are promising at first glance. See https://github.com/rhobimd-oss/shebe/blob/main/WHY_SHEBE.md#...


shebe asks the simple question: "where does this symbol appear as text?". For C++ codebases that heavily use templates and macros, shebe will struggle. But I'm curious how it would actually perform, so I'm currently performing a search on https://gitlab.com/libeigen/eigen. Will report the results shortly.


Tieredsort seems like a good balance between performance and complexity. Enough complexity (yet still relatively simple) to get very good performance.


yup exactly.


Anecdote:

I consulted for a large manufacturing firm building an application to track the logical design of a very complex product.

They modeled the parts as objects. No problem.

I was stunned to see the following pattern throughout the code base:

  Class of the object

  Instance #1 of the class

  Instances 2,,n of the class
I politely asked why this pattern existed. The answer was "it's always been that way."

I tracked down the Mechanical Engineer (PhD) who designed the logical parts model. He desk was, in fact, 100 feet away from mine.

I asked him what he intended, regarding the model. He responded "Blueprint, casting mold, and manufactured parts." - which I understood immediately, having studied engineering myself.

After telling him about the misunderstanding of his model by the software team, I asked him what he was going to do about it. He responded "Nothing."

I went back to the software team to explain the misunderstanding and the solution (i.e. blueprint => metaclass, casting mold => class, and manufactured parts => instances). The uniform response was "It is too late to change it now."

The result is a broken model that was wrong for more than a decade and may still be deployed. The cost of the associated technical debt is a function of 50+ team members having to delineate instance #1 from instances 2,,n for over a decade.

N.B. Most of the software team has a BS (or higher) in computer science.

P.S. Years later, I won't go anywhere near the manufactured product.


Seems like a pretty easy thing to clean up. I am confused by these devs who just seem to give up. Just fix it!


No one had the motivation to fix it, including management. Many of the developers saw the problem as job security.


come on man, give us a clue tell me at least it won't kill anyone


In the United States, we all used to take a required course called Civics.

We learned how government and justice worked.


45% slower to run everywhere from a single binary...

I'll take that deal any day!


That which is old is new again. The wheel keeps turning…

“Wait we can use Java to run anywhere? It’s slow but that’s ok! Let’s ride!”


There's a reason Java applets got deprecated in every browser. The runtime was inherently insecure. It just doesn't work for the web.

Also, targeting the JVM forces you to accept garbage collection, class-based OO and lots of pointer chasing. It's not a good target for most languages.

Java's pretty good, but wasm is actually a game changer.


The Java runtime isn't any more inherently insecure than the JavaScript runtime, and JavaScript seems to work just fine for the web.

The key reason why applet security failed was because it gave you the entire JDK by default, and so every method in the JDK needed to have explicit security checking code in place to restrict access. The model was backwards -- full control by default with selective disabling meant that every new feature in the JDK is a new vulnerability.


Just look up "Java applet sandbox escape". There were tons of ways to do it. Here are some [0]. Then there's the coarse-grained permissions that were essentially useless to begin with.

[0]: https://phrack.org/issues/70/7


Yes, I'm familiar with these. Many of the earliest problems were to due bugs in the verifier, and there were several different vendors with their own set of bugs. The bulk of these problems were identified and resolved over 25 years ago.

Most of the later problems are due to the fact that the API attack surface was too large, because of the backwards SecurityManager design. And because it existed, it seems there was little incentive to do something better.

Once the instrumentation API was introduced (Java 5), it made it easier to write agents which could limit access to APIs using an "allow" approach rather than the awful rules imposed by the SecurityManager. Java 9 introduced modules, further hardening the boundaries between trusted and untrusted code. It was at this point the SecurityManager should have been officially deprecated, instead of waiting four more years.

Going back to the earlier comment, the problem isn't due to the runtime being somehow inherently insecure, but instead due to the defective design of the SecurityManager. It hasn't been necessary for providing security for many years.


How does .Net stack up?


I'm not too sure, but the main reason MS developed it was because they just wanted Java without licensing it from Oracle, so I imagine they made a lot of similar design decisions.

Anyway, it's great if you compile it to Wasm.


I am a huge, huge fan of wasm. The first time I was able to compile a qt app to Linux, windows, Mac, and wasm targets, I was so tickled pick it was embarrassing. Felt like I was truly standing on the shoulders of giants and really appreciated the entirety of the whole “stack” if you will.

Running code in a browser isn’t novel. It’s very circular. I actually met someone the other day that thought JavaScript was a subset of Java. Same person was also fluent in php.

Wasm is really neat, I really love it. My cynical take on it is that, at the end of the day, it’ll just somehow help ad revenue to find another margin.


Fair. Running in the browser isn't novel, but JS/TS are some of the most popular languages in history and that almost certainly never would have happened without monopolizing the browser.

Expanding margins are fine by me. Anticompetitive markets are not. My hope is that wasm helps to break a couple strangleholds over platforms (cough cough iOS cough Android)


I really don’t think Apple is going to let anyone get away with too much browser appifying of iOS.


It's not a question of Apple letting anyone do anything. It's just a question of governments forcing it to do so.


45% slower to run everywhere from a single binary... with less security holes, without undefined behavior, and trivial to completely sandbox.

Its definitely a good deal!


> without undefined behavior

Undefined behaviour is defined with respect to the source language, not the execution engine. It means that the language specification does not assign meaning to certain source programs. Machine code (generally) doesn't have undefined behaviour, while a C program could, regardless of what it runs on.


Native code generally doesn't have undefined behaviour. C has undefined behaviour and that's a problem regardless of whether you're compiling to native or wasm.


Is compiling so hard?


I think that there are a few critical issues that are not being considered:

* LLMs don't understand the syntax of q (or any other programming language).

* LLMs don't understand the semantics of q (or any other programming language).

* Limited training data, as compared to kanguages like Python or javascript.

All of the above contribute to the failure modes when applying LLMs to the generation or "understanding" of source code in any programming language.


> Limited training data, as compared to kanguages like Python or javascript.

I use my own APL to build neural networks. This is probably the correct answer, and inline with my experience as well.

I changed the semantics and definition of a bunch of functions and none of the coding LLMs out there can even approach writing semidecent APL.


"English as a programming language" has neither well-defined syntax nor well-defined semantics.

There should be no expectation of a "correct" translation to any programming language.

N.B. Formal languages for specifying requirements and specifications have been in existence for decades and are rarely used.

From what I've observed, people creating software are reluctant to or incapable of producing [natural language] requirements and specifications that are rigorous & precise enough to be translated into correctly working software.


In the theoretical world where a subset of English could be formalized and proven and compiled, the complexity of the language would reduce my willingness to use it. I find that the draw of AI comes from it's "simplicity," and removing that (in favor of correct programs) would be pointless - because such a compiler would surely take forever to compile "English" code, and would not be too different from current high level languages, imo.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: