I always thought there should be a two-arg overload of new, so you could write new(bool, true) or new(int, 20). Would solve the problem without any trickery.
> You might think adding a UNIQUE index would cause the "losing" xact to get constraint errors, but instead both xacts succeed and no longer have a race condition.
This is not true. What happens is that the (sub)transaction that loses the race to the index is aborted:
=# INSERT INTO foo (bar) (SELECT max(bar) + 1 FROM foo);
ERROR: duplicate key value violates unique constraint "foo_bar_idx"
DETAIL: Key (bar)=(2) already exists.
> Imagine booting your computer and having to give an email to login offline.
As much as I hate what modern Windows has become, this is not actually true. If you know the correct sequence of clicks you can avoid this. Not defending this bullshit, though.
They've made it much more difficult than it was even a couple of years ago when you could just install without the computer connected to the internet. Now you have to start with an internet connection and then disconnect partway through and jump through command line hoops to make the installer let you keep installing
While that's kind of convenient in a lot of ways, it also makes querying the database really annoying, since you have to remember to add the filtering to every single query or you're screwed. Personally, I wish there was a database-implementation-acknowledged "deleted" flag, which could even expose a "history table"-like interface.
How does using a timestamp column make querying the database more annoying than using a boolean column? Aren't the places you have to use filters exactly the same either way?
Sequences kind of have the same issue, because you don't know if a gap is because of a rollback or an uncommitted transaction. Though with some logic you can do a pretty good job at this with sequences. And then you're not in the realm of "simple" anymore, at all.
I've had pretty much the exact same problem and what I went for in my low-volume case was to simply add advisory locks such that I can guarantee the transaction start times provide correct ordering.
Woah, that's news to me. Is that true even if triggers are used to update a column?
CREATE OR REPLACE FUNCTION
update_updated_at_function()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = now();
RETURN NEW;
END;
$$ language 'plpgsql';
CREATE TRIGGER
update_updated_at_trigger
BEFORE INSERT OR UPDATE ON
"my_schema"."my_table"
FOR EACH ROW EXECUTE PROCEDURE
update_updated_at_function();
END $$;
Is it possible for two rows to have `updated_at` timestamps that are different from the transaction commit order even if the above function and trigger are used? It's alright if `updated_at` and the commit timestamp are not the same, but the `updated_at` must represent commit order accurate to the millisecond/microsecond.
To confirm your fear, you can't use the updated_at timestamp as a proxy for commit order. The commits happen in a different order, and can be arbitrarily far apart, like hours or days depending on how long your transactions can last.
now() is the timestamp the transaction began at. There is no function to return the commit timestamp because you have to write the value before you commit.
For polling, instead of updated_at, I use a _txid column that gets set by a trigger to the current transaction ID. Then, when polling, use txid_current() to see which transactions have committed and which haven’t. It’s a little dicey and super easy to hit fencepost errors, but it’s been running smoothly in production for a few years.
> Adding a column, changing column's nullability and adding/changing constraints is already zero-downtime in PG.
Making a previously nullable column NOT NULL is not zero downtime. Neither are adding constraints -- except in some cases with NOT VALID, which isn't quite the same thing.
> Dropping a column and changing the data type are not zero-downtime.
I don't like really solutions which force everything into a single schema just to do migrations. They shouldn't be that difficult.
In this particular case, rather than making everything a view all the time, you could just use views during a migration window to get the same effect. Replace the table with a view which has all the columns plus the new name as an alias for the old one, migrate all the clients, replace the view with the table with the column renamed. No special permanent crap necessary.
The idea is not for all the different versions to be permanent, just temporary while the clients are being updated. Check out the top-level comment I wrote for a more fleshed our explanation!
It sounds like the 340k of memory quote. Yes, it did not age well, but you know what, some times people can't predict the future and at the time the quote was said, it made some sense then. The internet is full of "smarter than everyone else" that have never misspoke about anything, ever.
Bill gates did not misspeak. Because he has never said it in the first place. Not to rain on your parade, people make mistakes all the time and everyone has a duty to not escalate mistakes.
" Gates himself has strenuously denied making the comment. In a newspaper column that he wrote in the mid-1990s, Gates responded to a student's question about the quote: "I've said some stupid things and some wrong things, but not that. No one involved in computers would ever say that a certain amount of memory is enough for all time." Later in the column, he added, "I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." "
I specifically left the 340k quote unattributed for this reason. However, it is an (in)famous quote on the internet. That's all I was referencing. So, it's not my parade you're raining on, so hope you don't have a big jump from your soap box. You kind of proved my point of the internet if full of "people smarter than everyone else"
When a client connects to Postgres, Postgres creates a new process just for that client. Separate processes are great for isolation but it means Postgres connections are a bit expensive (this is an active area of improvement). Postgres really starts to struggle once you a have a few thousand connections, even if those connections aren’t doing anything.
The common workaround is to use a connection pooler like PGBouncer so that clients reuse connections. This approach doesn’t work for LISTEN because typically a client will listen for its entire lifecycle so you can’t share connections in a pool.
You need one connection per worker thread, and realistically you would only have one worker per cpu core. So how many LISTENing connections do you really need?
You generally have more than one database connection per thread. As an example, consider Node.js which acts like a single threaded process. You’d probably want to be able to handle more than 1 database query concurrently since network latency tends to dominate OLTP requests.
How you setup LISTEN and NOTIFY is app dependent. In a multi-tenant database, you could have 1 NOTIFY channel per tenant.
As you scale, you probably do something that listens in a smarter way, maybe 1 Go channel per client with a single LISTEN instead of 1 database connection per client. The downside is that now the app code is responsible for tenant isolation instead of the database.
> [...] since network latency tends to dominate OLTP requests.
You are absolutely right about that. However, in this case we are talking about background workers. I would argue that background workers, such as an image resizing worker, are typically CPU-bound and do not perform high-volume OLTP queries. Therefore, a background worker does not require multiple database connections per thread as a web server would.