On 2/18/16, Victor Blomqvist <vb@xxxxxxxx> wrote: > Hello! > > We just had a major issue on our databases, after a index was replaced a > user defined function didnt change its query plan to use the new index. At > least this is our theory, since the function in question became much slower > and as a result brought our system to a halt. > > Basically it went: > 1. create new index (a simple btree on a bigint column index) > 2. drop old index > 3. rename new index to old index name > 4. analyze table > > After these steps normally our functions will update their plans and use > the new index just fine. However this time the function (only one function > use this particular index) seemed to take forever to complete. This is a > 40GB table so querying for something not indexed would take a long time. > Therefore my suspicion is that the function didnt start to use the new > index. My guess is that backends somehow cached generic plan[1] and didn't recalculate it. > Adding to the strangeness is that if I ran the function manually it was > fast, It is because _backends_ (processes) cache plans, not DBMS (i.e. they are not shared). So you connected to DB (making a new backend process), run the function; backend could not find cached plan and create it for itself. Plan for _your_ connection includes the new index. > only when called from our application through pg_bouncer it was slow. > I should also say that the function is only used on our 3 read slaves setup > to our database. > > Things we tried to fix this: > 1. Analyze table > 2. Restart our application > 3. Recreate the function > 4. Kill the slow running queries with pg_cancel_backend() > > These things did not help. Since pgbouncer reuses connections then backends processes still have cached plan. If you tried pg_terminate_backend() it could help. > Instead what helped in the end was to replace the function with an extra > useless where clause (in the hope that it would force it to create a new > plan) > > So, the function only have a single SELECT inside: > RETURN QUERY > SELECT * FROM table > WHERE bigint_column = X > LIMIT 100 OFFSET 0; > > And this is my modification that made it work again: > RETURN QUERY > SELECT * FROM table > WHERE bigint_column = X AND 1=1 > LIMIT 100 OFFSET 0; Yes, it is a new query for PG, and therefore it requires a new plan because it is not in a cache. > Obviously we are now worried why this happened Also my guess you did CREATE INDEX CONCURRENTLY and there is several cases[2] when it can not be used ("invalid" state or waiting for unfinished transactions). When the old index is dropped but the new index is not accessible while a query/function is running a generated plan does not include that index. And in case of caching such plan is caching and uses later without index too. > and how we can avoid it in > the future? We run Postgres 9.3 on CentOS 6. Firstly you can drop the old index not immediately but a little later depending on yours queries time. Also after creating the new index (and possible waiting a little) you can drop index in a _transaction_ and see whether the new index is used in an EXPLAIN of any query that use it or not. In the first case do COMMIT, in the second case just do ROLLBACK and leave old index for using. > Thanks! > Victor [1]http://www.postgresql.org/docs/devel/static/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING [2]http://www.postgresql.org/docs/devel/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY -- Best regards, Vitaly Burovoy -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general