Hi, I'm trying to understand how to estimate and minimize memory consumption of ANALYZE operations on "big" tsvector columns. Context: Postgresql server is 9.1.19 (Ubuntu package 9.1.19-0ubuntu0.12.04). I have a database on which I noticed that autovacuum operations could consume up to 2 GB of resident memory (observed with top's RSS column). This is sometime problematic because with 3 autovacuum processes (default value on Ubuntu), this leads to a peak usage of 6 GB, and sends the server into swapin/swapout madness. This typically happens during restoration of dumps or massive updates in the database, which triggers the autovacuum processes, and slows down the server during the execution of these operations due to the swapin/swapout. Up to today we addressed this behavior by either disabling autovacuum or temporarily bumping the VM's memory limit for the duration of the operation. Now, I think I managed to replicate and isolate the problem. My analysis: I have a table with ~20k tuples, and 350 columns with type int, text and tsvector. I created a copy of this table and iteratively dropped some columns to see if a specific column was the cause of this spike in memory usage. And I came to the simple case of a table with a single tsvector column that causes ANALYZE to consume up to 2 GB or memory. So, this table has a single column of type tsvector, and this column is quite big because as it is originally the concatenation of all the other tsvector columns from the table (and this tsvector columns also has a GIST index). Here is the top 10 length for this column : --8<-- # SELECT length(fulltext) FROM test ORDER BY length DESC LIMIT 10; length -------- 87449 87449 87401 87272 87261 87259 87259 87259 87257 87257 (10 rows) -->8-- I tried playing with "default_statistics_target" (which is set to 100): if I reduce it to 5, then the ANALYZE is almost immediate and consumes less than ~200 MB. At 10, the process starts to consume up to ~440 MB. I see no difference in Postgresql's planning selection between "default_statistics_target" 1 and 100: EXPLAIN ANALYZE shows the same plan being executed using the GIST index (for a simple "SELECT count(ctid) FROM test WHERE fulltext @@ 'Hello'"). So: - Is there a way to estimate or reduce ANALYZE's peak memory usage on this kind of tables? - Is it "safe" to set STATISTICS = 1 on this particular "big" tsvector columns? Or could it have an adverse effect on query plan selection? I'm currently in the process of upgrading to Postgresql 9.5, so I'll see if the behavior changes or not on this version. Thanks, Jérôme -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance