I have a heavily used PostgreSQL 9.3.5 database on CentOS 6. Sometimes I need to add/remove columns, preferably without any service interruptions, but I get temporary errors.
I follow the safe operations list from https://www.braintreepayments.com/blog/safe-operations-for-high-volume-postgresql but many operations cause troubles anyway when the more busy tables are updated.
Typically I have user defined functions for all operations, and my table and functions follow this pattern:
CREATE TABLE users (
id integer PRIMARY KEY,
name varchar NOT NULL,
to_be_removed integer NOT NULL
);
CREATE FUNCTION select_users(id_ integer) RETURNS SETOF users AS
$$
BEGIN
RETURN QUERY SELECT * FROM users WHERE id = id_;
END;
$$ LANGUAGE plpgsql;
Then the actual queries are run by our application as
SELECT id, name FROM select_users(18);
As you can see the column to_be_removed is not selected. Then to remove the column I use:
ALTER TABLE users DROP COLUMN to_be_removed;
However, while the system is under load sometimes (more frequently and persistent the more load the system is experiencing) I get errors like these:
ERROR #42804 structure of query does not match function result type: Number of returned columns (2) does not match expected column count (3).
The same error can happen when columns are added. Can this be avoided somehow, or do I need to take the system offline during these kind of changes?
I follow the safe operations list from https://www.braintreepayments.com/blog/safe-operations-for-high-volume-postgresql but many operations cause troubles anyway when the more busy tables are updated.
Typically I have user defined functions for all operations, and my table and functions follow this pattern:
CREATE TABLE users (
id integer PRIMARY KEY,
name varchar NOT NULL,
to_be_removed integer NOT NULL
);
CREATE FUNCTION select_users(id_ integer) RETURNS SETOF users AS
$$
BEGIN
RETURN QUERY SELECT * FROM users WHERE id = id_;
END;
$$ LANGUAGE plpgsql;
Then the actual queries are run by our application as
SELECT id, name FROM select_users(18);
As you can see the column to_be_removed is not selected. Then to remove the column I use:
ALTER TABLE users DROP COLUMN to_be_removed;
However, while the system is under load sometimes (more frequently and persistent the more load the system is experiencing) I get errors like these:
ERROR #42804 structure of query does not match function result type: Number of returned columns (2) does not match expected column count (3).
The same error can happen when columns are added. Can this be avoided somehow, or do I need to take the system offline during these kind of changes?
For reference, there was a similar but not same issue posted to psql-bugs a long time ago:
http://www.postgresql.org/message-id/8254631e-61a5-4c03-899f-22fdcf369a46@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
I posted this same question at dba.stackexchange and got the advice to repost here: http://dba.stackexchange.com/questions/117511/postgresql-drop-column-under-load-give-wrong-number-of-columns-errors