Search Postgresql Archives

Re: Looking for settings/configuration for FASTEST reindex on idle system.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 9, 2014 at 2:03 PM, Jeff Amiel <becauseimjeff@xxxxxxxxx> wrote:
I have a maintenance window coming up and using pg_upgrade to upgrade from 9.2.X to 9.3.X.
As part of the window, I’d like to ‘cluster’ each table by its primary key.  After doing so, I see amazing performance improvements (probably mostly because of index bloat - but possibly due to table fragmentation)

Since you seem to have a test environment where you can so such things, you can try first reindexing, and then clustering, to so which step is the important one to get the performance improvement.  If it is the reindexing that is really the key, you could get that benefit outside of the maintenance window by building a new index with the same columns concurrently, then dropping the old one, doing it one index at a time.  If the cluster is the key part, that is hard to move outside a maintenance window.
 


That being said, I have a single table that is blowing my window -
at 140 million rows (28 gig in size with 75 gig worth of indexes), this bad boy is my white whale. There are 10 indexes (not including the primary key).  Yes - 10 is a lot - but I’ve been monitoring their use (most are single column or partial indexes) and all are used.

That being said, I’ve been reading and experimenting in trying to get a cluster of this table (which re-indexes all 10/11 indexes) to complete in a reasonable amount of time.

There are lots of settings and ranges to chose from and while my experiments continue, I was looking to get some input.  Lowest I have gotten for clustering this table is just under 6 hours. 

I am familiar with pg_reorg and it’s sibling pg_repack - but they call the base postgresql reindex functions underneath - and I have learned by using ‘verbose’ that the actual clustering of the table is quick - it’s the reindexing that is slow (It’s doing each reindex sequentially instead of concurently)

PostgreSQL 9.3.2 on x86_64-pc-solaris2.11, compiled by gcc (GCC) 4.5.2, 64-bit
500 gig of ram
2.7gig processors (48 cores)
Shared buffers set to 120gig
Maintenance work men set to 1gig
work men set to 500 meg

Things I have read/seen/been told to tweak…

fsync (set to off)

I see little point in that.  You are putting your data at serious risk, and bulk index creation shouldn't be doing much fsyncing anyway.
 
setting wal_level to minimal (to avoid wal logging of cluster activity)

That can be a big win in some cases, if it is compatible with your backup policy.  But I don't know if cluster is actually one of those cases.
 
bumping up maintenance work men (but I’ve also seen/read that uber high values cause disk based sorts which ultimately slow things down)

I don't understand that advice.  If it is big enough, you can *avoid* disk bases sorts by doing them in RAM, which should be faster, provided you actually have enough RAM so that you don't swap.  However, there are a variety of things which limit how much memory a sort could actually use (most of which will be removed in 9.4) so you might not get in-RAM sorts no matter how much you crank up maintenance_work_mem.  You can turn on trace_sort to see whether your sorts are in RAM or on disk, and how much memory they are using.

 
Tweaking checkpoint settings (although with wal_level set to minimal - I don’t think it comes into play)

Measure it--how often does it cycle through checkpoints during your test CLUSTER?
 

any good suggestions for lighting a fire under this process?

If worse comes to worse, I can vacuum full the table and reindex each index concurrently -   but it won’t give me the benefit of having the tuples ordered by their oft-grouped primary key.

If you vacuum full, it will rebuild the indexes for you automatically, so after that there is no point in doing a manual reindex. You could drop them manually before hand, and then build them manually afterward, but you can do that whether the central activity is a CLUSTER or a VACUUM FULL.  If you want to do this in parallel, then you need to keep in mind that maintenance_work_mem needs to be small enough to fit all of the concurrent builds, and your IO system needs to accomodate all of that traffic.

Cheers,

Jeff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux