Search Postgresql Archives

Re: Validating check constraints without a table scan?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 15, 2024 at 9:38 AM Philip Couling <couling@xxxxxxxxx> wrote:
Is there a solid reason why adding a check constraint does not use existing indexes for validation.

We are currently looking at partitioning a multi TB table leaving all existing data in place and simply attaching it as a partition to a new table. To prevent locking, we are trying to add an INVALID check constraint first and then validate it.

I can trivially prove the invalid constraint is valid with a simple SELECT which will use an existing index and return instantaneously. But AFAIK Theres no way to mark a constraint as valid without scanning all the rows.

Most likely your query is not exactly the same as the check constraint. Think about NULL and similar.
 
This operation is really problematic on a production database with heavy IO load.

Is there a solid ready why validating check constraints cannot use existing indexes? If I can prove the constraint is valid so trivially with a SELECT, then why can Postgres not do the same (or similar)?

Here is what has worked for me many times:

1. create the check constraint as NOT VALID. From now on no new or updated row is allowed to violate it.
2. check if the constraint holds with a query on a binary replica. Make sure the query starts only when the constraint is visible on the replica.
3. on the master: UPDATE pg_constraint SET conisvalidated=true WHERE conname='your_constraint_name' AND conrelid='schema.table'::regclass

How you perform step 2 is irrelevant. Checking it on a replica would simply avoid the load on the master. You just need to make sure there is no conflicting data in the table.

WARNING, you need to be damn sure of your data if you do that. But if you are, it works.

Here is the procedure how I solved the same problem for some of our multi-TB tables (PG14):

The table has a column called transaction_time. We wanted to partition by that column. For some historical reason the column did not have a NOT NULL constraint. However, there was no way our processes could insert NULL in that column and there was no row with NULL in that field. So, first was to add the NOT NULL constraint:

BEGIN;

ALTER TABLE my.table
ADD CONSTRAINT transaction_time_not_null CHECK(transaction_time IS NOT NULL) NOT VALID;

UPDATE pg_constraint
   SET convalidated=true
 WHERE conname = 'transaction_time_not_null'
   AND conrelid = 'my.table'::REGCLASS
RETURNING conname, conrelid::REGCLASS, convalidated;

COMMIT;

Now for cosmetic purposes we first turn the check constraint above into a normal NOT NULL constraint:

BEGIN;

SET LOCAL client_min_messages = 'debug4';
-- expecting this message
-- DEBUG:  existing constraints on column "table.transaction_time" are sufficient to prove that it does not contain nulls
ALTER TABLE my.table
      ALTER COLUMN transaction_time SET NOT NULL;
RESET client_min_messages;

ALTER TABLE my.table
      DROP CONSTRAINT transaction_time_not_null;

COMMIT;

If you set client_min_messages to something like debug4, then the database tells you if it wants to scan the table or if existing constraints are sufficient to prove the condition.

transaction_time in our case is never in the future. Also database transactions are short-lived. Nothing exceeds 30 seconds. So, I can trust the effect of any action performed more than 30 seconds ago in the database is visible.

So, I set the time after which new rows go to the new partition at least 10 minutes from now at the next hour boundary. 30 seconds would be good enough. I chose 10 minutes just for extra safety.

SELECT date_trunc('hour', now()+'1h 10min'::interval)::timestamp AS switch_time\gset

Next comes the actual change:

BEGIN;

-- rename the existing table
ALTER TABLE my.table RENAME TO table_old;

-- drop triggers. We will recreate them later.
DROP TRIGGER ... ON my.table_old;
DROP TRIGGER ...;

-- create partitioned table
CREATE TABLE my.table (
    LIKE my.table_old
    INCLUDING DEFAULTS
    INCLUDING CONSTRAINTS
)
PARTITION BY RANGE (transaction_time);

-- recreate triggers
CREATE TRIGGER ...
BEFORE DELETE ON my.table
   FOR EACH ROW EXECUTE FUNCTION my,trigger_fun();
CREATE TRIGGER ...;

-- create the partition for future rows
CREATE TABLE my.table_current PARTITION OF my.table
  FOR VALUES FROM (:'switch_time') TO ('infinity');

-- and some indexes and FK
ALTER TABLE my.table_current ADD PRIMARY KEY (id);
CREATE INDEX ON my.table_current (...);
ALTER TABLE my.table_current
  ADD CONSTRAINT fk_name
      FOREIGN KEY (...)
      REFERENCES ...(...) ON UPDATE ... ON DELETE ...;

-- now attach the old table as partition
ALTER TABLE my.table_old
ADD CONSTRAINT partition_boundaries
CHECK('-infinity'<=transaction_time AND transaction_time<:'switch_time')
  NOT VALID;

-- for procedural reasons we know the constraint is valid. Let's make PG believe it too.
UPDATE pg_constraint
   SET convalidated=true
 WHERE conname='partition_boundaries'
   AND conrelid='my.table_old'::REGCLASS
RETURNING conname, conrelid::REGCLASS, convalidated;

-- now attach it. We use again debug4 to check if the table scan is avoided.
SET LOCAL client_min_messages = 'debug4';
-- expecting
-- DEBUG:  partition constraint for table "table_old" is implied by existing constraints
ALTER TABLE my.table ATTACH PARTITION my.table_old
  FOR VALUES FROM ('-infinity') TO (:'switch_time');
RESET client_min_messages;

-- drop the now unnecessary constraint
ALTER TABLE my.table_old
 DROP CONSTRAINT partition_boundaries;

COMMIT;

Once the new partition gets too full, we will use a similar procedure to adjust the partition boundary of the new partition and then create the next partition.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux