5id's comments

5id · on Sept 24, 2023

One of the biggest benefits imo of using Postgres as your application queue, is that any async work you schedule benefits from transactionality.

That is, say you have a relatively complex backend mutation that needs to schedule some async work (eg sending an email after signup). With a Postgres queue, if you insert the job to send the email and then in a later part of the transaction, something fails and the transaction rollbacks, the email is never queued to be sent.

theptip · on Sept 24, 2023

Worth being clear that bridging to another non-idempotent system necessarily requires you to pick at-least-once or at-most-once semantics. So for emails, if you fail awaiting confirmation of your email you still need to pick between failing your transaction and potentially duplicating the email, or continuing and potentially dropping it.

The big advantage is for code paths which async modify your DB; these can be done fully transactionally with exactly-once semantics since the Job consumption and DB update are in the same transaction.

skybrian · on Sept 24, 2023

Email might never arrive, though. The only way to know they got it is to have them follow a link to confirm.

collinvandyck76 · on Sept 25, 2023

That's kind of missing the parent's point. If you wanted to ensure emails arrive, that sounds like another queue that could be backed by a different table that is also produced into as part of the original transaction.

laurencerowe · on Sept 24, 2023

> One of the biggest benefits imo of using Postgres as your application queue, is that any async work you schedule benefits from transactionality.

This is a really important point. I often end up using a combination of Postgres and SQS since SQS makes it easy to autoscale the job processing cluster.

In Postgres I have a transaction log table that includes columns for triggered events and the pg_current_xact_id() for the transaction. (You can also use the built in xmin of the row but then you have to worry about transaction wrap around.) Inserting into this row triggers a NOTIFY.

A background process runs in a loop. Selects all rows in the transaction table with a transaction id between the last run's xmin and the current pg_snapshot_xmin(pg_current_snapshot()). Maps those events to jobs and submits them to SQS. Records the xmin. LISTEN's to await the next NOTIFY.

matsemann · on Sept 24, 2023

Good point. We alleviate that a bit by scheduling our queue adds to not run until after commit. But then we still have some unsafety, and if connection to rabbit is down we're in trouble.

robertlagrant · on Sept 25, 2023

I agree - having to tell a database that something was processed, and fire off a message into RabbitMQ, say, is never 100% transactional. This would be my top reason to use this approach.

> With a Postgres queue, if you insert the job to send the email and then in a later part of the transaction, something fails and the transaction rollbacks, the email is never queued to be sent.

This is true - definitely worth isolating what should be totally separate database code into different transactions. On the other hand, if your user is not created in the DB, you might not want your signup email. Just depends on the situation.

ryanbrunner · on Sept 24, 2023

Another benefit of this is that you're guaranteed that the transaction is completed before the job is picked up. With redis-backed queues (or really anything else), you very quickly run into the situation where your queue executes a job depending on a database record existing prior to the transaction being committed (and the fix for this is usually awkward / complex code).

__jem · on Sept 24, 2023

I'm not sure this is really an issue with transactionality as a single request can obviously be split up into multiple transactions, but rather that even if you correctly flag the email as pending/errored, you either need to process these manually, or have some other kind of background task that looks for them, at which point why not just process them asynchronously.

vb-8448 · on Sept 25, 2023

> With a Postgres queue, if you insert the job to send the email and then in a later part of the transaction, something fails and the transaction rollbacks, the email is never queued to be sent.

An option could be use a second connection and a separate transaction to insert data in the queue table.

5id · on June 21, 2022

According to @dang (https://news.ycombinator.com/item?id=28479595) via @sctb (https://news.ycombinator.com/item?id=16076041)

  We’re recently running two machines (master and standby) at M5 Hosting. All of HN runs on a single box, nothing exotic:
  CPU: Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz (3500.07-MHz K8-class CPU)
  FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 hardware threads
  Mirrored SSDs for data, mirrored magnetic for logs (UFS)

arnaudsm · on June 21, 2022

Weird, I thought you needed 1024 Kubernetes nodes, a 70mb React bundle and 200 engineers to host 50M monthly sessions.

Aeolun · on June 21, 2022

How do you host 50k monthly sessions per node?! That’s like 0.02 sessions per second.

yuppie_scum · on June 21, 2022

I see what you did there!

goodpoint · on June 21, 2022

Hire 200 engineers and they'll find a way to justify 1024 k8s nodes.

highlytedious · on June 21, 2022

HN is a very simple application. Handling a high volume of traffic for a simple application is a very different problem from scaling a highly complex application.

sumy23 · on June 22, 2022

HN is simple, yes. But it could be made more complicated. Personalized feed and data analytics are two complicated things that come to mind. Staying simple is often a choice, and it’s a choice not many companies make.

krapp · on June 22, 2022

YCombinator doesn't need to run a Google Analytics script to have all the analytics they want.

postalrat · on June 21, 2022

What would make hn a simple application and reddit a highly complex application?

hombre_fatal · on June 21, 2022

HN is a straight forward forum. Reddit is one level above that: generalized forums as a service.

Anything HN has had to implement, Reddit has to implement at a generalized, user-facing level, like mod tools.

Frankly, we underestimate how hard forums are, even simple ones. I learned this the hard way rebuilding a popular vBulletin forum into a bespoke forum system.

Every feature people expect from a forum turns into a fractal of smaller moving parts, consideration, and infinite polish. Letting users create and manage forums is an explosion of even more things that used to be simple private /admin tools.

systemvoltage · on June 21, 2022

Mod tools are not accessed and used by all users. So the load of mod-tools on the servers is probably negligible.

I agree, most software is deceivingly simple from the outside. Once you start building it, you become more humble about the effort required to build anything moderately complex.

vel0city · on June 21, 2022

Mod tools aren't used by the majority of users, correct. But the existence of mod tools does make the logic and assumptions of the application different. Now you've got a whole set of permissions and permissions checks, additional interfaces, more schema, etc.

Its not that the mod tools are constantly being used, its that there's now potentially far more code complexity for those tools to even exist.

lifthrasiir · on June 21, 2022

User interaction, moderation, embedded media, a way more subreddits and different opinions they have, and so on.

lma21 · on June 21, 2022

is reddit really a complex application (regardless of how they build, scale, or deploy it)? Although that makes me wonder, what makes an application complex?

butterNaN · on June 21, 2022

I'll start with a crude metric: Number of bubbles in the use-case diagram

eejjjj82 · on June 23, 2022

Hiring 200 engineers

fsociety · on June 21, 2022

Because HN hasn't changed in forever, and behind the scenes the Reddit codebase is constantly evolving and growing.

krapp · on June 21, 2022

Hacker News changes more often than people think, just not the layout because people here are weirdly fetishistic about it.

Since I've been here they've added vouching for banned users (and actually warning people beforehand) thread folding, Show HN, making the second chance pool public, thread hiding, the past page, various navigation links and the API. They've also been trying to get a mobile stylesheet to work. They've also mentioned making various changes for spam detection and performance. And the url now automatically loads a canonical version if it finds one, and the title is now automatically edited for brevity. And I've probably missed a few things.

And HN isn't a simple application by any means. Go look at the Arc Forum code - it isn't optimized for readability, or scalability or reliability, but joy - for the vibe of experimental academic hacking on a Lisp. It's made of brain farts. Hacker News is probably significantly more complex than that for being attached to a SV startup company and running 'business code' and whatnot.

orf · on June 21, 2022

I mean, that’s not really that much is it. And that’s the point, HN really doesn’t change much. Whereas Reddit, for better or for worse, has a much higher output of new user facing features.

mr_toad · on June 22, 2022

> What would make hn a simple application and reddit a highly complex application?

Engineers.

Shounak · on June 21, 2022

Running ads, for one

numpad0 · on June 21, 2022

Is the configuration stupid, or, is it somehow imperative that work is distributed over 200 local engineers + over 70MB of externalities?

nikanj · on June 21, 2022

When a VC gives you a giant boatload of money, they insist you "scale up" the company overnight. So you go on a massive hiring spree, and get triple-digit team of engineers before having any market traction.

And they're tasked with building a product that can handle Google-levels of demand, though they currently only have two customers, neither of them paying.

It indeed is imperative, but not for technical reasons.

cosmodisk · on June 21, 2022

And then when the stock market drops by .1%, you lay off 30% of the workforce because that's what's needed for its survival.

rafale · on June 21, 2022

I would take the money then do none of that. And now I got a 5 years runway, enough time to build a product people like and use, and by then the investors won't be angry anymore.

pharmakom · on June 21, 2022

You would never get the money with this plan.

10000truths · on June 22, 2022

If the money comes with a stipulation that I have to spend it all in such a way that I screw myself and my company over, then I don’t want the money.

numpad0 · on June 22, 2022

And is it because those strawman VCs are all stupid, or, is it somehow ...

thirtyfivecent · on June 21, 2022

They do that so you're screwed later on without them when that first bit of money starts to run out and boom they own your company

manojlds · on June 21, 2022

HN is not user friendly. Better comparison is Stack Exchange which is way more rich and runs on small (relative) infra.

kasey_junk · on June 21, 2022

HN is perhaps the most user friendly site I go to with regularity.

The idea that a website needs to be “rich” to be usable is one of the dumbest things the industry has convinced itself of in the last 20 years (following only ‘yaml is a smart way to encode infrastructure’).

ASalazarMX · on June 21, 2022

To be fair, it's not as much user-friendly as it is simple, and simple tends to be easier to understand.

For example, if it was more user-friendly, it could have links to jump between root comments, because right now very popular top comments tend to accumulate most interactions, and scrolling down several pages to find the next root thread requires effort.

howenterprisey · on June 21, 2022

Is that not what the "prev"/"next" links do?

Terretta · on June 21, 2022

User on this account since 2010, 12K karma, and I just learned what next does.

TY!

ASalazarMX · on June 22, 2022

Duck me sideways, they were always there, but I was blind.

smitty1e · on June 21, 2022

Compulsive over-engineering is by no means an IT-centric problem.

It takes substantial wisdom to arrive at an 80% solution and cease fannying about.

xyproto · on June 21, 2022

And the people that brings this wisdom also brings few or no metrics that are appreciated by management.

fisf · on June 21, 2022

The people who push the other direction also bring few or no metrics. I.e. there is often no reason to add <bag of features>, except a customer (who didn't buy the product yet) mentioned them as nice to have during initial sales talks.

laputan_machine · on June 21, 2022

I prefer YAML to JSON for our infra. I know some people do not like the whitespace.

What do you prefer?

ElectricalUnion · on June 21, 2022

JSON:

* doesn't encode Norway to false;

* most formatters for JSON are deterministic.

* doesn't deserialize into arbitrary objects;

YAML, in in constrast...

* YAML is insecure by default and will deserialize into arbitrary objects;

* YAML knows that there's no such thing as wall clock time, there's only number of seconds since midnight;

* YAML has 22 ways of writing true or false, and the parser will silently replace your "strings" with false.

* There are 63 ways of writing multi-line strings;

* A truncated YAML file is still a "valid" YAML file.

https://noyaml.com/

hombre_fatal · on June 21, 2022

IMO the solution to YAML-as-config is a strict subset of YAML.

JSON is one strict subset, but one that makes smart trade-offs for strictness and machines like error detection and syntax-typed types.

We decided on a different subset of YAML for our users that were modifying config by hand (even more strict than StrictYAML). Some of the biggest features of YAML are that there is no syntax typing, and collection syntax is simple (e.g. also true for JSON, false for TOML).

For example, a string and a number look the same. This seems bad to us developers at first, but the user doesn't have to waste 20 min chasing down an unmatched quote when modifying config in a <textarea>. Beyond that, it's the same amount of work as making sure the JSON is `"age": 20` instead of `"age": "20"`, one just has noisier syntax.

I think the StrictYAML docs have a great breakdown of the advantages: https://hitchdev.com/strictyaml/why-not/

We decided against TOML because nesting is too confusing. https://github.com/toml-lang/toml/issues/846

withinboredom · on June 21, 2022

Declarative code. CSS is better than YAML for describing a desired state.

jjeaff · on June 21, 2022

Not on mobile.

manojlds · on June 22, 2022

Thanks for the downvotes everyone, else wouldn't have even known there were so many replies to my comment.

spinningslate · on June 21, 2022

>Stack Exchange which is way more rich and runs on small (relative) infra.

Yes, I've heard that SO runs on relatively simple and modest infra. And agree that would be a good example.

>HN is not user friendly

How so? I find the HN UX a refreshingly simple and effective experience. It might not have all the bells and whistles of newer discussions fora, but it doesn't obviously need them. I'd say it's a good example of form/function well suited to need. Not perfect perhaps, but very effective.

YMMV of course.

withinboredom · on June 21, 2022

Try loading it on a 2G (2 bars = 128kbits per second — those are bits not bytes) connection. It loads almost instantly with no fuss. Now try loading virtually any site on the same, if it ever loads at all without timing out, you’ll be waiting over 10 minutes.

twistedpair · on June 21, 2022

There was a YT preso from several years back where the StackExchange founder explained how it ran off just ~10 servers, and could run on half that many if needed. He stressed the simplicity of their arch, and that their problem space was massively cachable, so the servers just had a few hundred GB of ram, and only had to do work to rerender pages, but could store them in cache most of the time. It was a C#.Net app.

So, I think there is a lot more in common than you think between HN and SO.

laputan_machine · on June 21, 2022

What about HN is not user friendly? I think it's a breath of fresh (stale?) air.

dividedbyzero · on June 21, 2022

My pet peeves: No dark mode, sorely lacking for me for reading in the dark, then there is no indication at all that you've got replies (at least a tiny number next to threads perhaps?) and the up/downvote buttons are too small to reliably tap on mobile. Oh, and enumeration support would be fantastic, the workarounds tend to be hard to read.

Other than that, I think it's delightfully ugly and lightweight.

kazinator · on June 21, 2022

I use the Dark Reader extension for Firefox; HN looks fine under that.

Having to separately configure individual sites or web apps for dark mode is a nonstarter anyway; if you could do that, would you really want to?

Ideally, you should be able to set your device to dark mode, and everything would follow: every app, every site in the browser.

Some combination of setting your OS to dark mode and using a dark mode extension in the browser sort of approximates that, imperfectly.

SirAiedail · on June 21, 2022

No need to set it per individual page. There are (arguably easy to use) ways for a web page to know the user's OS-level color scheme preference [0].

We still need the workaround via extensions or Userstyles for the ones that don't implement that, sadly.

[0] https://developer.mozilla.org/en-US/docs/Web/CSS/@media/pref...

PickledHotdog · on June 21, 2022

You may not be down for an app/mobile experience, but Harmonic is beautiful and has a dark mode

dividedbyzero · on June 21, 2022

I can't seem to find Harmonic in the iOS App Store, is it Android-only?

Also, HN apps tend to make it harder to send interesting things to Roam or the laptop or Safari's reading list, the website makes that really convenient.

russelg · on June 21, 2022

Thanks for the recommendation, just switched to it!

muzani · on June 21, 2022

I believe the internet term is dank air.

FerretFred · on June 21, 2022

Yeah! I really miss all those ads (not!)

arnaudsm · on June 21, 2022

I agree HN could be improved with small CSS changes, but no backend change would be required.

lm28469 · on June 21, 2022

Well the good thing with CSS is that you can override it with your own stuff locally if you wish to

frosted-flakes · on June 21, 2022

Tough to do when the entire layout is built with nested tables, like it's still 1999.

D13Fd · on June 21, 2022

Tougher to do on mobile though

aaaaaaaaata · on June 21, 2022

How does anyone use anything aside from Materialistic?!

stefanvdw1 · on June 21, 2022

I wouldn’t say it’s not user friendly but I understand where you are coming from. I also missed some more modern features/looks and decided to build my own open source client [0]. Feel free to give it a go to see if it’s more your taste!

0. https://modernorange.io

SMAAART · on June 21, 2022

https://stackexchange.com/performance

azangru · on June 21, 2022

I love Hacker News. It is very friendly to my phone.

Ologn · on June 21, 2022

By rich do you mean popping up a captcha every time I search for something?

likortera · on June 21, 2022

No microservices on top of kubernetes? no SPA with SSR? You are doing it wrong.

I'm gonna write an alternative which will be WebScale.

j/k of course.

lakomen · on June 21, 2022

Well it is SSR tbh :)

sboomer · on June 21, 2022

With or without caching?

atmosx · on June 21, 2022

I wonder if they use something like CARP[^1] for redundancy. Also, strikes as odd that they didn't go with ZFS for storage, makes FS management _way_ easier for engineers who don't spent all their on these kind of operations.

[^1]: https://www.freebsd.org/cgi/man.cgi?query=carp&sektion=4

dsr_ · on June 21, 2022

You might ask what sort of filesystem maintenance they ever need to do. Replacing a disk is covered by the mirror. Backup is straightforward. The second system covers a lot more. If they need to increase hardware capacity, they can build new systems, copy in the background, and swap over with a few minutes of downtime.

seanw444 · on June 21, 2022

Curious how much memory usage it sits at on average.

gigatexal · on June 21, 2022

Love to see FreeBSD getting some love.

yodsanklai · on June 21, 2022

(beginner question) How do they store the data? is an SQL db on overkill for such a use case? what would be the alternative? an ad-hoc filesystem based solution? then how do the two servers share the db? and is there redundancy at the db level? is it replicated somehow?

petercooper · on June 21, 2022

"ad-hoc filesystem based solution" is the closest of your definitions, I think. Last time I saw/heard, HN was built in Arc, a Lisp dialect, and use(s/d) a variant of this (mirrored) code: https://github.com/wting/hackernews

Check out around this area of the code to see how simple it is. All just files and directories: https://github.com/wting/hackernews/blob/master/news.arc#L16... .. the beauty of this simple approach is a lack of moving parts, and it's easy to slap Redis on top if you need caching or something.

There is a modern maintained variant at https://github.com/arclanguage/anarki/tree/master/apps/news as well if you want to spin up your own HN-a-like and have the patience.

File syncing between machines is pretty much an easily solved problem. I don't know how they do it, but it could be something like https://syncthing.net/ or even some scripting with `rsync`. Heck, a cronned `tar | gzip | scp` might even be enough for an app whose data isn't exactly mission critical.

scottwick · on June 22, 2022

Wow, I had no idea HN was built like that - I'm impressed. I really wish I could read the Arc code better though since I'd love to know more about the details of how data is represented on disk and when things move in and out of memory, etc.

Does anyone know of other open source applications with similar architectures like this?

krapp · on June 22, 2022

>Does anyone know of other open source applications with similar architectures like this?

There's a good reason everyone else just uses a relational database, and it isn't because everyone else is addicted to unnecessary complexity.

justsomehnguy · on June 22, 2022

> and it's easy to slap Redis on top if you need caching

With filesystem as the storage you don't even need Redis, OS would cache the most recent files anyway.

krapp · on June 21, 2022

Data is stored in flat text files containing Arc Lisp tables, or in RAM. There is no 'database' per se, unless they've added one and not mentioned it.

You can get the software and language HN is based on here: http://arclanguage.org

niclo · on June 21, 2022

I think the link is broken, it's not HTTPS

krapp · on June 21, 2022

Force of habit, I fixed it.

russelg · on June 21, 2022

I love the design similarity to HN.

CobaltFire · on June 21, 2022

That’s because HN is just about the only thing written in Arc, and everything else you see is a fork of an earlier version of HN.

5id · on June 24, 2015

It's sad to think that with these laws being passed, regardless of what position you take, that we still don't have any Fair Use provisions in Australia. There was even a discussion paper [http://www.alrc.gov.au/publications/4-case-fair-use-australi...] put forward by our Law Reform Commission suggesting this. I would have though the productivity benefits associated with education and innovation alone would make this a no brainer

PebblesHD · on June 24, 2015

I believe one of the only amendments to this bill that got through was a requirement that Tony Abbott provide a formal response to that paper.

5id · on Oct 8, 2014

Go to cloud.digitalocean.com/support and create a new ticket, giving them your promo code and asking nicely, and they'll put it through promptly in my experience