On Fri, 9 Oct 2015 14:32:44 +0800 Victor Blomqvist <vb@xxxxxxxx> wrote: > I have a heavily used PostgreSQL 9.3.5 database on CentOS 6. Sometimes I > need to add/remove columns, preferably without any service interruptions, > but I get temporary errors. > > I follow the safe operations list from > https://www.braintreepayments.com/blog/safe-operations-for-high-volume-postgresql > but many operations cause troubles anyway when the more busy tables are > updated. I seriously doubt if Paul did enough research to be sure that "safe" is an absolute term for that list. > Typically I have user defined functions for all operations, and my table > and functions follow this pattern: > > CREATE TABLE users ( > id integer PRIMARY KEY, > name varchar NOT NULL, > to_be_removed integer NOT NULL > ); > > CREATE FUNCTION select_users(id_ integer) RETURNS SETOF users AS > $$ > BEGIN > RETURN QUERY SELECT * FROM users WHERE id = id_; > END; > $$ LANGUAGE plpgsql; > > Then the actual queries are run by our application as > > SELECT id, name FROM select_users(18); > > As you can see the column to_be_removed is not selected. Sure it is ... the function does SELECT *, which absolutely includes the to_be_removed column. The fact that you ignore that column in a subsequent superselect doesn't mean that the query in the function knows to do so. > Then to remove the > column I use: > > ALTER TABLE users DROP COLUMN to_be_removed; > > However, while the system is under load sometimes (more frequently and > persistent the more load the system is experiencing) I get errors like > these: > > ERROR #42804 structure of query does not match function result type: > Number of returned columns (2) does not match expected column count (3). This is a bit surprising to me. I would expect Postgres to have one or the other definition of that row within a single transaction, but what seems to be happening is that the ALTER causes the row definition to be changed in the middle of the transaction, thus the the function may return 3 columns, but when the outer query checks the type, it sees that it should only have 2. > The same error can happen when columns are added. Can this be avoided > somehow, or do I need to take the system offline during these kind of > changes? > > For reference, there was a similar but not same issue posted to psql-bugs a > long time ago: > http://www.postgresql.org/message-id/8254631e-61a5-4c03-899f-22fdcf369a46@xxxxxxxxxxxxxxxxxxxxxxxxxxxx > > I posted this same question at dba.stackexchange and got the advice to > repost here: > http://dba.stackexchange.com/questions/117511/postgresql-drop-column-under-load-give-wrong-number-of-columns-errors This has a lot to do with internals. You should wait a bit to see if you get a good answer, but if not you might need to post to the hackers list so the developers can chime in. My opinion is that this is a bug, but it's an obscure enough bug that it's not surprising that it's gone unfixed for a while. -- Bill Moran -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general