Hi Scott
On 19-Jul-2024 15:34, Scott Ribe wrote:
On Jul 19, 2024, at 7:46 AM, Thomas Simpson <ts@xxxxxxxxxxxxxx> wrote:
I realize some of the background was snipped on what I sent to the hacker list, I'll try to fill in the details.
I was gone from my computer for a day and lost track of the thread.
Perhaps logical replication could help you out here?
I'm not sure - perhaps, but at this point, I've got that
dump/reload running and provided it completes ok (in about 20 days
time at current rate), I'll be fine with this.
The database itself is essentially an archive of data so is no
longer being added to at this point, so it's an annoyance for the
rebuild time rather than a disaster.
[But incidentally, I am working on an even larger project which
is likely to make this one seem small, so improvement around large
databases is important to me.]
However, my thought is around how to avoid this issue in the
future and to improve the experience for others faced with the
dump-reload which is always the fall-back upgrade suggestion
between versions.
Getting parallelism should be possible and the current pg_dump
does that for directory mode from what I can see - making multiple
threads etc. according to parallel.c in pg_dump, it even looks
like most of where my thought process was going is actually
already there.
The extension should be adding synchronization/checkpointing
between the generating dump and the receiving reload to ensure
objects are not processed until all their requirements are already
present in the new database. This is all based around routing via
network streams instead of the filesystem as currently happens.
Perhaps this is already in place since the restore can be done in
parallel, so must need to implement that ordering already? If
someone with a good understanding of dump is able to comment or
even give suggestions, I'm not against making an attempt to
implement something as a first attempt.
I see Tom Lane from git blame did a bunch of work around the
parallel dump back in 2020 - perhaps he could make suggestions
either via private direct email or the list ?
Thanks
Tom