Re: Enhance pg_dump multi-threaded streaming (WAS: Re: filesystem full during vacuum - space recovery issues)

Thomas Simpson <ts@xxxxxxxxxxxxxx> · Fri, 19 Jul 2024 22:17:52 -0400

    Hi Doug

    On 19-Jul-2024 17:21, Doug Reynolds
      wrote:

      Thomas—

      Why are you using logical backups for a database
        this large?  A solution like PgBackRest?  Obviously, if you are
        going to upgrade, but for operational use, that seems to be a
        slow choice.

    In normal operation the server runs as a primary-replica and
      pgbackrest handles backups.  Right when disk space was used up,
      pgbackrest also took a backup during the failed vacuum so going
      back to it (or anything earlier) would also roll forward the WALs
      for recovery to date and put me right back where I am just now by
      running out of space part way through.
    It's a pragmatic decision that trying various things short of the
      dump-reload would take a number of days for me to try and see if I
      could get them to work with a high likelihood of needing to resort
      to dump-reload anyway.  I'd already tried a few file
      matching/moving exercises by they all prevented the database
      starting up so I cut my losses and started the dump-reload this
      week instead of next week since there's a limited window before
      this becomes a larger problem.

    My thoughts on improving pg_dump are to help make it a better
      tool for worst case scenarios like this for the future or for
      those that like the dump-reload as part of upgrades but have
      reasonable size databases.
    Thanks
    Tom

      Doug

        On Jul 19, 2024, at 4:26 PM, Thomas
          Simpson <ts@xxxxxxxxxxxxxx> wrote:

          Hi Scott

          On 19-Jul-2024 15:34, Scott Ribe
            wrote:

              On Jul 19, 2024, at 7:46 AM, Thomas Simpson <ts@xxxxxxxxxxxxxx> wrote:

I realize some of the background was snipped on what I sent to the hacker list, I'll try to fill in the details.

            I was gone from my computer for a day and lost track of the thread.

Perhaps logical replication could help you out here?

          I'm not sure - perhaps, but at this point, I've got that
            dump/reload running and provided it completes ok (in about
            20 days time at current rate), I'll be fine with this.
          The database itself is essentially an archive of data so is
            no longer being added to at this point, so it's an annoyance
            for the rebuild time rather than a disaster.
          [But incidentally, I am working on an even larger project
            which is likely to make this one seem small, so improvement
            around large databases is important to me.]

          However, my thought is around how to avoid this issue in
            the future and to improve the experience for others faced
            with the dump-reload which is always the fall-back upgrade
            suggestion between versions.
          Getting parallelism should be possible and the current
            pg_dump does that for directory mode from what I can see -
            making multiple threads etc.  according to parallel.c in
            pg_dump, it even looks like most of where my thought process
            was going is actually already there.
          The extension should be adding
            synchronization/checkpointing between the generating dump
            and the receiving reload to ensure objects are not processed
            until all their requirements are already present in the new
            database.  This is all based around routing via network
            streams instead of the filesystem as currently happens.
          Perhaps this is already in place since the restore can be
            done in parallel, so must need to implement that ordering
            already?  If someone with a good understanding of dump is
            able to comment or even give suggestions, I'm not against
            making an attempt to implement something as a first attempt.
          I see Tom Lane from git blame did a bunch of work around
            the parallel dump back in 2020 - perhaps he could make
            suggestions either via private direct email or the list ?

          Thanks
          Tom