Hacker Newsnew | past | comments | ask | show | jobs | submit | 5id's commentslogin

One of the biggest benefits imo of using Postgres as your application queue, is that any async work you schedule benefits from transactionality.

That is, say you have a relatively complex backend mutation that needs to schedule some async work (eg sending an email after signup). With a Postgres queue, if you insert the job to send the email and then in a later part of the transaction, something fails and the transaction rollbacks, the email is never queued to be sent.


Worth being clear that bridging to another non-idempotent system necessarily requires you to pick at-least-once or at-most-once semantics. So for emails, if you fail awaiting confirmation of your email you still need to pick between failing your transaction and potentially duplicating the email, or continuing and potentially dropping it.

The big advantage is for code paths which async modify your DB; these can be done fully transactionally with exactly-once semantics since the Job consumption and DB update are in the same transaction.


Email might never arrive, though. The only way to know they got it is to have them follow a link to confirm.


That's kind of missing the parent's point. If you wanted to ensure emails arrive, that sounds like another queue that could be backed by a different table that is also produced into as part of the original transaction.


> One of the biggest benefits imo of using Postgres as your application queue, is that any async work you schedule benefits from transactionality.

This is a really important point. I often end up using a combination of Postgres and SQS since SQS makes it easy to autoscale the job processing cluster.

In Postgres I have a transaction log table that includes columns for triggered events and the pg_current_xact_id() for the transaction. (You can also use the built in xmin of the row but then you have to worry about transaction wrap around.) Inserting into this row triggers a NOTIFY.

A background process runs in a loop. Selects all rows in the transaction table with a transaction id between the last run's xmin and the current pg_snapshot_xmin(pg_current_snapshot()). Maps those events to jobs and submits them to SQS. Records the xmin. LISTEN's to await the next NOTIFY.


Good point. We alleviate that a bit by scheduling our queue adds to not run until after commit. But then we still have some unsafety, and if connection to rabbit is down we're in trouble.


I agree - having to tell a database that something was processed, and fire off a message into RabbitMQ, say, is never 100% transactional. This would be my top reason to use this approach.

> With a Postgres queue, if you insert the job to send the email and then in a later part of the transaction, something fails and the transaction rollbacks, the email is never queued to be sent.

This is true - definitely worth isolating what should be totally separate database code into different transactions. On the other hand, if your user is not created in the DB, you might not want your signup email. Just depends on the situation.


Another benefit of this is that you're guaranteed that the transaction is completed before the job is picked up. With redis-backed queues (or really anything else), you very quickly run into the situation where your queue executes a job depending on a database record existing prior to the transaction being committed (and the fix for this is usually awkward / complex code).


I'm not sure this is really an issue with transactionality as a single request can obviously be split up into multiple transactions, but rather that even if you correctly flag the email as pending/errored, you either need to process these manually, or have some other kind of background task that looks for them, at which point why not just process them asynchronously.


> With a Postgres queue, if you insert the job to send the email and then in a later part of the transaction, something fails and the transaction rollbacks, the email is never queued to be sent.

An option could be use a second connection and a separate transaction to insert data in the queue table.


According to @dang (https://news.ycombinator.com/item?id=28479595) via @sctb (https://news.ycombinator.com/item?id=16076041)

  We’re recently running two machines (master and standby) at M5 Hosting. All of HN runs on a single box, nothing exotic:
  CPU: Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz (3500.07-MHz K8-class CPU)
  FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 hardware threads
  Mirrored SSDs for data, mirrored magnetic for logs (UFS)


Weird, I thought you needed 1024 Kubernetes nodes, a 70mb React bundle and 200 engineers to host 50M monthly sessions.


How do you host 50k monthly sessions per node?! That’s like 0.02 sessions per second.


I see what you did there!


Hire 200 engineers and they'll find a way to justify 1024 k8s nodes.


HN is a very simple application. Handling a high volume of traffic for a simple application is a very different problem from scaling a highly complex application.


HN is simple, yes. But it could be made more complicated. Personalized feed and data analytics are two complicated things that come to mind. Staying simple is often a choice, and it’s a choice not many companies make.


YCombinator doesn't need to run a Google Analytics script to have all the analytics they want.


What would make hn a simple application and reddit a highly complex application?


HN is a straight forward forum. Reddit is one level above that: generalized forums as a service.

Anything HN has had to implement, Reddit has to implement at a generalized, user-facing level, like mod tools.

Frankly, we underestimate how hard forums are, even simple ones. I learned this the hard way rebuilding a popular vBulletin forum into a bespoke forum system.

Every feature people expect from a forum turns into a fractal of smaller moving parts, consideration, and infinite polish. Letting users create and manage forums is an explosion of even more things that used to be simple private /admin tools.


Mod tools are not accessed and used by all users. So the load of mod-tools on the servers is probably negligible.

I agree, most software is deceivingly simple from the outside. Once you start building it, you become more humble about the effort required to build anything moderately complex.


Mod tools aren't used by the majority of users, correct. But the existence of mod tools does make the logic and assumptions of the application different. Now you've got a whole set of permissions and permissions checks, additional interfaces, more schema, etc.

Its not that the mod tools are constantly being used, its that there's now potentially far more code complexity for those tools to even exist.


User interaction, moderation, embedded media, a way more subreddits and different opinions they have, and so on.


is reddit really a complex application (regardless of how they build, scale, or deploy it)? Although that makes me wonder, what makes an application complex?


I'll start with a crude metric: Number of bubbles in the use-case diagram


Hiring 200 engineers


Because HN hasn't changed in forever, and behind the scenes the Reddit codebase is constantly evolving and growing.


Hacker News changes more often than people think, just not the layout because people here are weirdly fetishistic about it.

Since I've been here they've added vouching for banned users (and actually warning people beforehand) thread folding, Show HN, making the second chance pool public, thread hiding, the past page, various navigation links and the API. They've also been trying to get a mobile stylesheet to work. They've also mentioned making various changes for spam detection and performance. And the url now automatically loads a canonical version if it finds one, and the title is now automatically edited for brevity. And I've probably missed a few things.

And HN isn't a simple application by any means. Go look at the Arc Forum code - it isn't optimized for readability, or scalability or reliability, but joy - for the vibe of experimental academic hacking on a Lisp. It's made of brain farts. Hacker News is probably significantly more complex than that for being attached to a SV startup company and running 'business code' and whatnot.


I mean, that’s not really that much is it. And that’s the point, HN really doesn’t change much. Whereas Reddit, for better or for worse, has a much higher output of new user facing features.


> What would make hn a simple application and reddit a highly complex application?

Engineers.


Running ads, for one


Is the configuration stupid, or, is it somehow imperative that work is distributed over 200 local engineers + over 70MB of externalities?


When a VC gives you a giant boatload of money, they insist you "scale up" the company overnight. So you go on a massive hiring spree, and get triple-digit team of engineers before having any market traction.

And they're tasked with building a product that can handle Google-levels of demand, though they currently only have two customers, neither of them paying.

It indeed is imperative, but not for technical reasons.


And then when the stock market drops by .1%, you lay off 30% of the workforce because that's what's needed for its survival.


I would take the money then do none of that. And now I got a 5 years runway, enough time to build a product people like and use, and by then the investors won't be angry anymore.


You would never get the money with this plan.


If the money comes with a stipulation that I have to spend it all in such a way that I screw myself and my company over, then I don’t want the money.


And is it because those strawman VCs are all stupid, or, is it somehow ...


They do that so you're screwed later on without them when that first bit of money starts to run out and boom they own your company


HN is not user friendly. Better comparison is Stack Exchange which is way more rich and runs on small (relative) infra.


HN is perhaps the most user friendly site I go to with regularity.

The idea that a website needs to be “rich” to be usable is one of the dumbest things the industry has convinced itself of in the last 20 years (following only ‘yaml is a smart way to encode infrastructure’).


To be fair, it's not as much user-friendly as it is simple, and simple tends to be easier to understand.

For example, if it was more user-friendly, it could have links to jump between root comments, because right now very popular top comments tend to accumulate most interactions, and scrolling down several pages to find the next root thread requires effort.


Is that not what the "prev"/"next" links do?


User on this account since 2010, 12K karma, and I just learned what next does.

TY!


Duck me sideways, they were always there, but I was blind.


Compulsive over-engineering is by no means an IT-centric problem.

It takes substantial wisdom to arrive at an 80% solution and cease fannying about.


And the people that brings this wisdom also brings few or no metrics that are appreciated by management.


The people who push the other direction also bring few or no metrics. I.e. there is often no reason to add <bag of features>, except a customer (who didn't buy the product yet) mentioned them as nice to have during initial sales talks.


I prefer YAML to JSON for our infra. I know some people do not like the whitespace.

What do you prefer?


JSON:

* doesn't encode Norway to false;

* most formatters for JSON are deterministic.

* doesn't deserialize into arbitrary objects;

YAML, in in constrast...

* YAML is insecure by default and will deserialize into arbitrary objects;

* YAML knows that there's no such thing as wall clock time, there's only number of seconds since midnight;

* YAML has 22 ways of writing true or false, and the parser will silently replace your "strings" with false.

* There are 63 ways of writing multi-line strings;

* A truncated YAML file is still a "valid" YAML file.

https://noyaml.com/


IMO the solution to YAML-as-config is a strict subset of YAML.

JSON is one strict subset, but one that makes smart trade-offs for strictness and machines like error detection and syntax-typed types.

We decided on a different subset of YAML for our users that were modifying config by hand (even more strict than StrictYAML). Some of the biggest features of YAML are that there is no syntax typing, and collection syntax is simple (e.g. also true for JSON, false for TOML).

For example, a string and a number look the same. This seems bad to us developers at first, but the user doesn't have to waste 20 min chasing down an unmatched quote when modifying config in a <textarea>. Beyond that, it's the same amount of work as making sure the JSON is `"age": 20` instead of `"age": "20"`, one just has noisier syntax.

I think the StrictYAML docs have a great breakdown of the advantages: https://hitchdev.com/strictyaml/why-not/

We decided against TOML because nesting is too confusing. https://github.com/toml-lang/toml/issues/846


Declarative code. CSS is better than YAML for describing a desired state.


Not on mobile.


Thanks for the downvotes everyone, else wouldn't have even known there were so many replies to my comment.


>Stack Exchange which is way more rich and runs on small (relative) infra.

Yes, I've heard that SO runs on relatively simple and modest infra. And agree that would be a good example.

>HN is not user friendly

How so? I find the HN UX a refreshingly simple and effective experience. It might not have all the bells and whistles of newer discussions fora, but it doesn't obviously need them. I'd say it's a good example of form/function well suited to need. Not perfect perhaps, but very effective.

YMMV of course.


Try loading it on a 2G (2 bars = 128kbits per second — those are bits not bytes) connection. It loads almost instantly with no fuss. Now try loading virtually any site on the same, if it ever loads at all without timing out, you’ll be waiting over 10 minutes.


There was a YT preso from several years back where the StackExchange founder explained how it ran off just ~10 servers, and could run on half that many if needed. He stressed the simplicity of their arch, and that their problem space was massively cachable, so the servers just had a few hundred GB of ram, and only had to do work to rerender pages, but could store them in cache most of the time. It was a C#.Net app.

So, I think there is a lot more in common than you think between HN and SO.


What about HN is not user friendly? I think it's a breath of fresh (stale?) air.


My pet peeves: No dark mode, sorely lacking for me for reading in the dark, then there is no indication at all that you've got replies (at least a tiny number next to threads perhaps?) and the up/downvote buttons are too small to reliably tap on mobile. Oh, and enumeration support would be fantastic, the workarounds tend to be hard to read.

Other than that, I think it's delightfully ugly and lightweight.


I use the Dark Reader extension for Firefox; HN looks fine under that.

Having to separately configure individual sites or web apps for dark mode is a nonstarter anyway; if you could do that, would you really want to?

Ideally, you should be able to set your device to dark mode, and everything would follow: every app, every site in the browser.

Some combination of setting your OS to dark mode and using a dark mode extension in the browser sort of approximates that, imperfectly.


No need to set it per individual page. There are (arguably easy to use) ways for a web page to know the user's OS-level color scheme preference [0].

We still need the workaround via extensions or Userstyles for the ones that don't implement that, sadly.

[0] https://developer.mozilla.org/en-US/docs/Web/CSS/@media/pref...


You may not be down for an app/mobile experience, but Harmonic is beautiful and has a dark mode


I can't seem to find Harmonic in the iOS App Store, is it Android-only?

Also, HN apps tend to make it harder to send interesting things to Roam or the laptop or Safari's reading list, the website makes that really convenient.


Thanks for the recommendation, just switched to it!


I believe the internet term is dank air.


Yeah! I really miss all those ads (not!)


I agree HN could be improved with small CSS changes, but no backend change would be required.


Well the good thing with CSS is that you can override it with your own stuff locally if you wish to


Tough to do when the entire layout is built with nested tables, like it's still 1999.


Tougher to do on mobile though


How does anyone use anything aside from Materialistic?!


I wouldn’t say it’s not user friendly but I understand where you are coming from. I also missed some more modern features/looks and decided to build my own open source client [0]. Feel free to give it a go to see if it’s more your taste!

0. https://modernorange.io



I love Hacker News. It is very friendly to my phone.


By rich do you mean popping up a captcha every time I search for something?


No microservices on top of kubernetes? no SPA with SSR? You are doing it wrong.

I'm gonna write an alternative which will be WebScale.

j/k of course.


Well it is SSR tbh :)


With or without caching?


I wonder if they use something like CARP[^1] for redundancy. Also, strikes as odd that they didn't go with ZFS for storage, makes FS management _way_ easier for engineers who don't spent all their on these kind of operations.

[^1]: https://www.freebsd.org/cgi/man.cgi?query=carp&sektion=4


You might ask what sort of filesystem maintenance they ever need to do. Replacing a disk is covered by the mirror. Backup is straightforward. The second system covers a lot more. If they need to increase hardware capacity, they can build new systems, copy in the background, and swap over with a few minutes of downtime.


Curious how much memory usage it sits at on average.


Love to see FreeBSD getting some love.


(beginner question) How do they store the data? is an SQL db on overkill for such a use case? what would be the alternative? an ad-hoc filesystem based solution? then how do the two servers share the db? and is there redundancy at the db level? is it replicated somehow?


"ad-hoc filesystem based solution" is the closest of your definitions, I think. Last time I saw/heard, HN was built in Arc, a Lisp dialect, and use(s/d) a variant of this (mirrored) code: https://github.com/wting/hackernews

Check out around this area of the code to see how simple it is. All just files and directories: https://github.com/wting/hackernews/blob/master/news.arc#L16... .. the beauty of this simple approach is a lack of moving parts, and it's easy to slap Redis on top if you need caching or something.

There is a modern maintained variant at https://github.com/arclanguage/anarki/tree/master/apps/news as well if you want to spin up your own HN-a-like and have the patience.

File syncing between machines is pretty much an easily solved problem. I don't know how they do it, but it could be something like https://syncthing.net/ or even some scripting with `rsync`. Heck, a cronned `tar | gzip | scp` might even be enough for an app whose data isn't exactly mission critical.


Wow, I had no idea HN was built like that - I'm impressed. I really wish I could read the Arc code better though since I'd love to know more about the details of how data is represented on disk and when things move in and out of memory, etc.

Does anyone know of other open source applications with similar architectures like this?


>Does anyone know of other open source applications with similar architectures like this?

There's a good reason everyone else just uses a relational database, and it isn't because everyone else is addicted to unnecessary complexity.


> and it's easy to slap Redis on top if you need caching

With filesystem as the storage you don't even need Redis, OS would cache the most recent files anyway.


Data is stored in flat text files containing Arc Lisp tables, or in RAM. There is no 'database' per se, unless they've added one and not mentioned it.

You can get the software and language HN is based on here: http://arclanguage.org


I think the link is broken, it's not HTTPS


Force of habit, I fixed it.


I love the design similarity to HN.


That’s because HN is just about the only thing written in Arc, and everything else you see is a fork of an earlier version of HN.


It's sad to think that with these laws being passed, regardless of what position you take, that we still don't have any Fair Use provisions in Australia. There was even a discussion paper [http://www.alrc.gov.au/publications/4-case-fair-use-australi...] put forward by our Law Reform Commission suggesting this. I would have though the productivity benefits associated with education and innovation alone would make this a no brainer


I believe one of the only amendments to this bill that got through was a requirement that Tony Abbott provide a formal response to that paper.


Go to cloud.digitalocean.com/support and create a new ticket, giving them your promo code and asking nicely, and they'll put it through promptly in my experience


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: