Re: Suggestion: provide a "TRUNCATE PARTITION" command

Michael Lewis <mlewis@xxxxxxxxxxx> · Fri, 8 Jan 2021 09:47:19 -0700

On Fri, Jan 8, 2021 at 9:38 AM Thomas Kellerer <shammat@xxxxxxx> wrote:
Michael Lewis schrieb am 08.01.2021 um 16:32:

> On Fri, Jan 8, 2021 at 2:36 AM Thomas Kellerer <shammat@xxxxxxx <mailto:shammat@xxxxxxx>> wrote:

>

>     Hello,

>

>     I wonder if it made sense to add a "TRUNCATE PARTITION" command to Postgres?

>

>     Especially during bulk loads it's more efficient to TRUNCATE a partition if I know I want to replace all rows, rather than doing a DELETE.

>

>     Currently this requires dynamic SQL which isn't always feasible (and might get complicated quickly).

>

>     So I was thinking that a new command to allow truncating partitions by identifying the partitions by "value" rather by name might be helpful in that case.

>

>     Something along the lines of:

>

>           truncate partitions of base_table

>           for values in (...);

>

>     If the IN part allowed for sub-queries then this could be used to gather the partition keys from e.g. a staging table.

>

>

> For me, it seems too easily error prone such that a single typo in

> the IN clause may result in an entire partition being removed that

> wasn't supposed to be targeted.

I don't see how this is more dangerous then:

     delete from base_table

     where partition_key in (...);

which would serve the same purpose, albeit less efficient.

Delete has a rollback option, and you can dry-run to see impacted rows effectively. Truncate does not.

With delete, you are being more explicit about which rows match and need removal. By looking at the command, you know exactly what is expected to happen. With the request to find partitions based on values, you may be impacting MUCH more data than you meant to. If you think you have monthly range partitions and actually have year partitions, the truncate could be a disaster with removing more data than you intended.

It just seems like a foot gun to me, and not one that is particularly needed since the same result can be achieved easily in two steps. One to generate to explicit commands that will be run, and one to run them.