Search Postgresql Archives

Re: CLUSTER vs. VACUUM FULL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 22, 2024 at 3:14 PM Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:


On 4/22/24 11:45 AM, Ron Johnson wrote:
> On Mon, Apr 22, 2024 at 12:29 PM David G. Johnston
> <david.g.johnston@xxxxxxxxx <mailto:david.g.johnston@xxxxxxxxx>> wrote:
>
>
>
>     On Mon, Apr 22, 2024, 08:37 Ron Johnson <ronljohnsonjr@xxxxxxxxx
>     <mailto:ronljohnsonjr@xxxxxxxxx>> wrote:
>
>         On Mon, Apr 22, 2024 at 10:25 AM Tom Lane <tgl@xxxxxxxxxxxxx
>         <mailto:tgl@xxxxxxxxxxxxx>> wrote:
>
>             Marcos Pegoraro <marcos@xxxxxxxxxx
>             <mailto:marcos@xxxxxxxxxx>> writes:
>              > But wouldn't it be good that VACUUM FULL uses that index
>             defined by
>              > Cluster, if it exists ?
>
>             No ... what would be the difference then?
>
>         What the VACUUM docs "should" do, it seems, is suggest CLUSTER
>         on the PK, if the PK is a sequence (whether that be an actual
>         sequence, or a timestamp or something else that grows
>         monotonically).
>
>         That's because the data is already roughly in PK order.
>
>
>     If things are bad enough to require a vacuum full that doesn't seem
>     like a good assumption.
>
>
> Sure it does.
>
> For example, I just deleted the oldest half of the records in 30
> tables.  Tables who's CREATED_ON timestamp value strongly correlates to
> the synthetic PK sequence values.
>
> Thus, the remaining records were still mostly in PK order.  CLUSTERs on
> the PK values would have taken just about as much time as the VACUUM
> FULL statements which I /did/ run.

1) If they are already in enough of a PK order that the CLUSTER time vs
VACUUM FULL time would not be material as there is not much or any
sorting to do then what does the CLUSTER gain you?

Not much.  Now they're just "slightly more ordered" instead of "slightly less ordered" for little if any extra effort.
 
2) What evidence is there that the records where still in PK order just
because you deleted based on CREATED_ON? I understand the correlation
between CREATED_ON and the PK just not sure why that would necessarily
translate to an on disk order by PK?

1. Records are appended to tables in INSERT order, and INSERT order is highly correlated to synthetic PK, by the nature of sequences.
2. My original email showed that CLUSTER took just as long as VACUUM FULL.  That means not many records had to be sorted, because... the on-disk order was strongly correlated to PK and CREATED_ON.

Will that happen every time in every circumstance in every database?  No, and I never said it would.  But it does in my database in this application.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux