Looking at the file with changes https://github.com/postgres/postgres/blob/master/src/backend... , I have to say this source code repository is so well documented/commented and structured, I really gives you a huge trust in postgres to be used in your stack.
Large C codebases _have_ to be exceptionally nice, or they immediately collapse under their own weight. As a dev team, the language teaches you this the hard way. I've never seen a terrible huge C codebase (but have seen many in other languages).
Can confirm they do. I worked at a company that was producing control systems for electric engines. Great environment and fun job but the code was beyond redemption. 15k line files with 2k+ lines #ifdef statements that ran different code for different customers, some variable names were just curses against pushy clients, not a single abstraction in sight.
Not only they exist, they power massive machines that could crush a person in the blink of an eye.
Quite the opposite, I hate it when projects have dozens upon dozens of modules with 1 function. Multiple huge files are the best sweet spot. (Only crazy adn exceptional things like putting everything into a single file damages the readability imho)
A source code file with very few lines will mean more cognitive load is required to remember which file (or package) some functionality is, and likely more effort to maintain the code.
Whereas, source code files which are very large mean more cognitive effort to remember where in the file some function is. In many languages, variables can be local to a file, which means use of such variables is riskier, etc.
It may be desirable to combine smaller files into a more coherent whole, or to split up an overly complicated large file into several smaller files.
Sure, without seeing code, I don't think you can come up with a concrete rule which exists in all cases. Rules of thumb can still be useful as an indication of maintenance effort.
I get the impression that long files are culturally acceptable in systems-level C code. E.g., just a cherry-picked file from Linux: kernel/sched/core.c is over 11k lines.
I feel the long file issue is mostly gone - that the problem isn’t the file length but spaghetti code. If it makes sense to be in one file it should be in one file. Breaking it up simply to reduce file length is counter productive.
There are 18609 .c files in my checked-out copy of the FreeBSD src tree. The median length is 258 lines; 90% are 1373 lines or shorter; 99% are 5241 lines or shorter.
The statistics for the 6071 .c files in the FreeBSD kernel are somewhat higher -- median is 460 lines; 90th percentile is 2070 lines; 99th percentile is 7678 lines -- but your example of a 11133 line file is definitely at the extreme high end.
I didn't run any stats when I found that file. Just clicked around in github a handful of times looking for something that seemed like it'd be complex.
Depends on the language. C# somewhat replaces paths with namespaces, then you navigate classes and methods with editor tooling. I remember back when I was on Visual Studio writing C++ that they did something similar.
Actually I don’t mind big files. It is simpler scanning through it or doing a quick search than if you had a bunch of smaller files. And 5000 lines is not awkward for most editors, especially as many editors have the ability to collapse functions.
This feels like a good area for tooling (editors, source hosts, SCM extensions) to improve experience. I don’t always mind large source files (and sometimes may prefer them over large file system hierarchies), but the can be a pain to navigate in some circumstances.
As an example, making several related changes in very different parts of a file, where you need to cross-reference between them. The changes themselves might be small, but it’s a huge cognitive burden to alternate/iterate through them. I’d love to have a view which temporarily projects those targets as if they’re isolated files without changing the actual structure on disk. I’d love it so much I actually do this manually for a lot of tasks, creating temp files to prepare edits for related areas of code. But then I lose a lot of the benefits of tools which understand what’s being referenced. It would be great to just type a quick command (or click or whatever) to say “don’t refactor this function to another file, but let’s pretend you did, for a while”.
This is how older editors like emacs work. You interact with views/windows/tabs called buffers and those buffers can have files loaded into them. Multiple buffers can reference the same code file but view different sections simultaneously. So you can investigate or edit different parts of one huge file the same way you would smaller ones.
I use JOE and JOE also supports multiple views into the same file. In fact, to open another file is two commands: the open view/window (^KO) command followed by edit file command (^KE). I've always used this facility for as long as I can remember and it never really occurred to me until now that people using more modern editors and especially GUI editors may not enjoy this same convenience--either not possible or no simple chain of command inputs to get there. And it's not like I don't use GUI editors, just not in situations where I would realize this feature was missing.
I just opened IntelliJ and it has this feature too, where you can "Split right" or "Split down" and have multiple views of the same file. Thanks for letting me know this is something editors might support.
For some reason I have not noticed that feature before. It's not like it is hidden either, as it is in the "right click" context menu that I use daily. I guess I need to learn the tool, so that I don't miss useful features like this.
> The changes themselves might be small, but it’s a huge cognitive burden to alternate/iterate through them. I’d love to have a view which temporarily projects those targets as if they’re isolated files without changing the actual structure on disk.
This is exactly how vim buffers work (for instance in a split) work.
While I'm not sure that was a consideration here, sometimes C compilers produce better machine code when they got access to more function definitions. Eg. Sqlite recommends that embedders use the single ~10mb sqlite.c file[1] for both ease of use and performance reasons.
It is slow. Our (C++) project's MSVC release build ends with a glorious 2-minute run of link.exe with lto (/LTCG /O2) and aggressive inlining (/Ob3) enabled.
Limiting unit size in C++ helps with faster edit/compile/run cycles as well, which doesn't seem to be concern for C codebases in 21st century.
5000 lines looks like half are comments or whitespace, none of the functions look more than a couple hundred lines or a few levels of control flow depth. Pretty harsh nitpick.
But nitpick still means it is a nit, I don't really think it is at all. And you're saying it in response to someone who said the code is very clean, which is quite petty.
There is ZUUL Gating[0] CI it is actually the perfect solution for this. It works with Github or Git based repository system.
It automatically tests the changes with a simulated merge on master together.
So it orders PR1 -> PR2 -> PR3 -> .... -> PR-100 by order of approval.
If PR1 -> PR2 (Fails) -> PR3 -> .... -> PR-100
It restarts -> PR3 -> .... -> PR-100 and Up after removing PR2. This behavior is even customizable.
I don't really know much about optimizing storage costs, But You could learn from storage giants.
Example is Blackblaze storage pod 6.0 according to them it holds 0.5PB with a cost of 10k$, you will need about 20*10K$ = 200K$ + Maintenance(They also publish failure rates) , The schematics and everything is in their website and according to them they have already a supplier who provides them with such devices which you could probably buy from. Note: This was published 2016, they probably have Pod 7.0 by now so cost may be better.
Is there any privacy benifit to use containers now with the new isolations built into firefox? I'm using cookie autodelete + Containers, So now its either isolate and keep them or Isolate and delete them. I quite like this.
Containers aren't going to give you additional protection against third-party cookies with this feature. But, you still have other useful benefits like having different sessions open on the same websites using containers, or just grouping websites by forcing them into specific containers (Work/Personal/Random etc..).
I'm not the one who asked the question but am in the same position. All third-party content is off, I'm using a long-term container for stuff where I need to be logged, and temporary containers and no first-party cookies for everything else. I do have some bugs with the interaction of both so I'm happy if I can have the same think with stock Firefox
Note: I have "Enable HTTPS-Only Mode in all windows" on by default.