On Tue, Feb 27, 2024 at 7:26 PM Alvaro Herrera <alvherre@xxxxxxxxxxxxxx> wrote:
On 2024-Feb-27, Dominique Devienne wrote:
> Hi. I just tried adding the xmin pseudo-column to my COPY,
> and I get an error.
I suggest you find some other way to achieve whatever it is you think
you'll achieve with this. It's not reliable and you'll regret it
eventually. Maybe take a step back and let us know why you want to copy
xmin in the first place?
OK, fair enough. I've read about xmin many times, in the doc and some articles,
and I've aware at a high-level about PostgreSQL's MVCC and the possibility of
wrap-around of xmin, but I'm definitely not an expert. So here goes, you asked for it.
I'm replicating custom backups from our old architecture / backend, in our new
PostgreSQL-based "backend". Our "backups" are a mix of disk files and DB rows.
Sometimes we need whole schema tables/rows; other times we need subset of tables
*and* subset of rows in those tables. I've looked at pg_dump and pg_dumpbin, but for various
reasons I won't get into, I prefer to roll my own using COPY (to each his own, right).
So far, so good. I'm getting good perf. It's flexible. My backups are semi-structured,
semi-opaque, which allows to have easy visibility inside the backups w/o restoring them.
One feature the old backups had, was incremental backups, after a base full one.
There's where xmin comes in. I already have a hash of each row, as returned by COPY,
so I can always do a content-based, exact / accurate comparison of the old and new rows,
to know whether it needs to be part of the incremental backup or not. But that's expensive,
because I still need to COPY TO the whole table, and compare rows client-side. So I was
thinking of using the xmin of the old (as recorded in the backup) and new (DB side) rows,
as an alternative way to determine what needs to be transferred across to the client.
The idea being that xmin will change on UPDATEs.
I'd also need logic for INSERTs and DELETEs, of course.
If I can somehow record a global XID Epoch at backup time, I can detect wrap-arounds.
But instead of testing inequality of the old and new xmin, I was thinking of using strict equality,
accepting the false positive (and thus excess transfers), and relying of the hash-based comparison
to avoid only the writing-to-disk part. (the chances of same xmin after a wrap-around is just so low...)
So there you go. This is why I want to capture xmin of each row I backup.
If this is a bad idea, as you wrote, please explain why in details.
I'm very willing to learn where/how that scheme would go wrong.
But I'd need to understand the technical details. TIA.
> Do I need to switch to a `COPY (select xmin, cols... from tab) ...` to
> access xmin?
If your ultimate objective is to shoot your feet, then yes, this will work.
Yes, I see that now. It does indeed work.
But I'd still like more details on the differences between the two forms of COPY, please.
There are people here who know implementations details of COPY I'm sure.
And I'd very interesting to know whether it matters or not.
Yes, I can test and bench myself too. Still, I'd like insights into the impl(s) please.
Thanks, --DD