On Mon, May 8, 2023 at 7:55 AM Michael Paquier <michael@xxxxxxxxxxx> wrote: > > On Sun, May 07, 2023 at 10:30:52PM +1200, Thomas Munro wrote: > > Bug-in-PostgreSQL explanations could include that we forgot it was > > dirty, or some backend wrote it out to the wrong file; but if we were > > forgetting something like permanent or dirty, would there be a more > > systematic failure? Oh, it could require special rare timing if it is > > similar to 8a8661828's confusion about permanence level or otherwise > > somehow not setting BM_PERMANENT, but in the target blocks, so I think > > that'd require a checkpoint AND a crash. It doesn't reproduce for me, > > but perhaps more unlucky ingredients are needed. > > > > Bug-in-OS/FS explanations could include that a whole lot of writes > > were mysteriously lost in some time window, so all those files still > > contain the zeroes we write first in smgrextend(). I guess this > > previously rare (previously limited to hash indexes?) use of sparse > > file hole-punching could be a factor in an it's-all-ZFS's-fault > > explanation: > > Yes, you would need a bit of all that. > > I can reproduce the same backtrace here. That's just my usual laptop > with ext4, so this would be a Postgres bug. First, here are the four > things running in parallel so as I can get a failure in loading a > critical index when connecting: > 1) Create and drop a database with WAL_LOG as strategy and the > regression database as template: > while true; do > createdb --template=regression --strategy=wal_log testdb; > dropdb testdb; > done > 2) Feeding more data to pg_class in the middle, while testing the > connection to the database created: > while true; > do psql -c 'create table popo as select 1 as a;' regression > /dev/null 2>&1 ; > psql testdb -c "select 1" > /dev/null 2>&1 ; > psql -c 'drop table popo' regression > /dev/null 2>&1 ; > psql testdb -c "select 1" > /dev/null 2>&1 ; > done; > 3) Force some checkpoints: > while true; do psql -c 'checkpoint' > /dev/null 2>&1; sleep 4; done > 4) Force a few crashes and recoveries: > while true ; do pg_ctl stop -m immediate ; pg_ctl start ; sleep 4 ; done > I am able to reproduce this using the steps given above, I am also trying to analyze this further. I will send the update once I get some clue. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com