Search Postgresql Archives

Visibility Map Issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all,
     We have a data warehouse (postgres 11.5, on centos 7) that contains many instances of the following structure:

 - table_a
 - table_b
 - a view that selects everything from either table_a or table_b

All external queries select from the view, so we can rebuild the table that isn't exposed by the view and then switch the view to point at the most recent table.

We have a procedure that will drop indexes from the non-visible table, truncate it, commit, repopulate it with data (using oracle_fdw), commit, build the indexes, commit, run VACUUM ANALYZE (using pg_background), and then re-write the view to point at the new table (and a final commit).

The issue I am dealing with is that on occasion the VACUUM step does not update the visibility map (as evidenced by pg_class.relallvisible = 0), which means the planner won't use index-only scans.  I can't figure out the reason for this.  I have tried the following:
 - change VACUUM ANALYZE table_name to VACUUM (ANALYZE, DISABLE_PAGE_SKIPPING) table_name
 - Get xmin of a row in the table (all rows should be the same) and compare to txid_snapshot_xmin(txid_current_snapshot()) before vacuuming.  I used raise notice to log these values and I never saw a conflict.  For example, before running a vacuum I logged xmin = 207781 and the txid_snapshot_xmin was 207785.  After running VACUUM ANALYZE, however, relallvisible was set to 0.
 - Running oracle_close_connections() after loading data (before indexing and vacuuming) just in case there was an issue with open Oracle connections
 - Setting old_snapshot_threshold to 60s and waiting (pg_sleep) for 60s, 65s, 120s, or 125s...

My questions:
 - Is there some way of knowing ahead of time that a VACUUM will actually set the visibility of all pages?  I would expect the visibility map to not be updated if there are older transactions open, but shouldn't I be able to see that by examining the snapshot?
- Is there an issue running all of this in a procedure?  Since I'm repeating the same thing with small variations for many tables, I was hoping to keep this in a procedure, but if there are issues with how the transactions might interact with pg_background/VACUUM, I may have to abandon this approach.

There are no standby servers and vacuum_defer_cleanup_age is set to 0,.


I tried to create a minimum reproducible example:
CREATE TABLE test_a (a int);
CREATE index ix_test_a on test_a(a);

CREATE OR REPLACE PROCEDURE test_switch() AS $$
  DECLARE
   visible_pages int;
  BEGIN
    DROP INDEX IF EXISTS ix_test_a;
    TRUNCATE TABLE test_a;
    COMMIT;

    INSERT INTO test_a SELECT generate_series(1,10000);
    COMMIT;

    CREATE INDEX ix_test_a ON test_a (a);
    COMMIT;

    RAISE NOTICE 'xmin:%', (SELECT xmin from test_a limit 1);
    RAISE NOTICE 'snapshot min:%', (txid_snapshot_xmin(txid_current_snapshot()));
    PERFORM * FROM pg_background_result(pg_background_launch('VACUUM (ANALYZE,DISABLE_PAGE_SKIPPING,VERBOSE) test_a')) as result(a text);
    SELECT relallvisible FROM pg_class WHERE relname = 'test_a' INTO visible_pages;
    IF visible_pages = 0 THEN
      RAISE EXCEPTION 'NO VISIBLE PAGES';
    ELSE
      RAISE NOTICE 'relallvisible:%', visible_pages;
    END IF;
  END;

$$
LANGUAGE PLPGSQL;

I ran this repeatedly, using \watch 1 in psql.  At the same time, I ran three other sessions running BEGIN; SELECT 1; COMMIT; every 1s.  I got a failure in test_switch after about one minute:

NOTICE:  xmin:1949
NOTICE:  snapshot min:1951
INFO:  aggressively vacuuming "public.test_a"
INFO:  index "ix_test_a" now contains 10000 row versions in 30 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "test_a": found 0 removable, 10000 nonremovable row versions in 45 out of 45 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 1948
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  analyzing "public.test_a"
INFO:  "test_a": scanned 45 of 45 pages, containing 10000 live rows and 0 dead rows; 10000 rows in sample, 10000 estimated total rows
ERROR:  NO VISIBLE PAGES

I don't understand why the snapshot min says 1951, but the VACUUM output says oldest xmin: 1948.  

Thanks,
Jeremy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux