Moving data from huge table slow, min() query on indexed column taking 38s

Dimitrios Apostolou <jimis@xxxxxxx> · Thu, 6 Jul 2023 17:00:19 +0200 (CEST)

I have a huge (10G rows) table "tbl_legacy" named test_runs that I want to
PARTITION BY RANGE on an indexed integer column.

I have created the new partitioned table "tbl" and 1000 partitions in it,
so that it can take my existing data and have the partitions ready for the
next year (so most of the new partitions will be empty for now).

On the topic of bulk-transferring the data from tbl_legacy to tbl:

+ First I chose the method to DELETE...INSERT everything.

  I noticed that the postgres process was growing without bounds, up to
  the point that it consumed almost all of swap space and I had to kill
  the command. Any idea why that happens?

+ Then I broke the command above in many small chunks

      WITH rows AS (
            DELETE FROM tbl_legacy AS t
                WHERE (partition_key_column < $1)
                RETURNING t.*
        ) INSERT INTO tbl SELECT * FROM rows;
      COMMIT;

  I increase the parameter $1 and keep going in a loop.  At first this
  goes OK, after one day though I notice that it has slowed down
  significantly. My investigation shows that the simple query

    SELECT min(partition_key_column) from tbl_legacy;

  takes 38s, despite having an index on the column! A VACUUM fixes that,
  so I guess the index has a ton of dead tuples. I guess autovacuum does
  not do its job because the table is constantly busy.

  Unfortunately VACUUM takes long (several hours) on this huge table, so I
  can't add in the loop after the DELETE command.

  Is there a better way to avoid the bloat in the loop above?  Why can't
  the DELETE command update the index by pointing the minimum element
  beyond the dead tuples?

+ Finally I resorted to just copying the data:

    INSERT INTO tbl SELECT * FROM tbl_legacy;

  This took half a day or so but worked without blowing up the memory
  usage like the 1st query.  Why is that?

  By the way, I don't consider the last solution ideal, since the data is
  duplicated between the two tables and it's harder to implement
  workarounds to do it without taking downtime. But it worked for me for
  this time. Any other ideas for partitioning an existing huge table?

Thanks in advance,
Dimitris