Search Postgresql Archives

Rapid disk usage spikes when updating large tables with GIN indexes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello!

We have a mid-sized database on RDS running 10.1 (32 cores, 240 GB RAM, 5TB total disk space, 20k PIOPS) with several large (100GB+, tens of millions of rows) tables that use GIN indexes for full-text search. We at times need to index very large (hundreds of pages) documents and as a result our tables have a mix of small (tens of tokens) to very large (hundreds of thousands of tokens near to the tsvector 1MB limit). All our GIN indexes have fastupdate turned off — we found that turning fastupdate on led to significant blocking and that we get better average performance with it turned off. We’ve put a lot of effort into tuning our database over the last several years to the point where we have acceptable read and write performance for these tables. 

One recurring, and predictable, issue that we have experienced regularly for multiple years is that inserting or updating rows in any table with GIN indexes results in extremely large drops in free disk space — i.e. inserting 10k rows with a total size of 10GB can result in the temporary loss of several hundred gigabytes of free disk space over 2-3 hours (and it could be more — we try to keep a 10-15% buffer of free disk space so that often represents almost all available disk space). Once we stop the operation, free disk space rapidly recovers, which makes us believe that this occurs due to logs, or some kind of temporary table. Our work_mem and maintenance_work_mem settings are pretty large (12GB and 62GB, respectively). The database’s size on disk scarcely budges during this process.

Unfortunately, we’re on RDS, so we’re unable to ssh directly into the instance to see what files are so large, and none of the logs we can see (nor the wal logs) are large enough to explain this process. Any suggestions about where to look to see the cause of this problem (or about any settings we can tune or changes we could make to stop it) would be greatly appreciated.

Thank you!




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux