Peter,
Thanks for taking the time to respond and hope you're having a good weekend. My
apologies for not noting the Postgres version. I thought about it after I sent
the email but I didn't want to spam the thread.
$ psql --version
psql (PostgreSQL) 11.6
One interesting thing you mention is that the system probably changed libc
versions. This is correct. We recently moved from an Alpine based docker
container running `musl libc 1.1.22` to a RHEL based one running `glibc 2.17`.
This was done a month or so ago. The `Go` extension row would've existed in the
database before this switch.
I dumped and restored our database in prod and all seems to be fine now. I'm
now testing using the backup from prod on my local machine with the same RHEL
image running `glibc 2.17`.
coder=# select ext_name from extensions where ext_name = 'Go';
ext_name
----------
(0 rows)
coder=# SELECT bt_index_check('extensions_pkey', true);
ERROR: item order invariant violated for index "extensions_pkey"
DETAIL: Lower index tid=(1,2) (points to heap tid=(0,2)) higher index tid=(1,3) (points to heap tid=(2,1)) page lsn=1/FF9C13B8.
Looks like it found something. I checked out the contrib/pageinspect docs but
wasn't too sure what to run. Are incompatible libc versions causing btree
corruption something you consider a bug? If it's something you'd like to look
into further I can gladly send over the database files.
Thanks,
Colin
Thanks for taking the time to respond and hope you're having a good weekend. My
apologies for not noting the Postgres version. I thought about it after I sent
the email but I didn't want to spam the thread.
$ psql --version
psql (PostgreSQL) 11.6
One interesting thing you mention is that the system probably changed libc
versions. This is correct. We recently moved from an Alpine based docker
container running `musl libc 1.1.22` to a RHEL based one running `glibc 2.17`.
This was done a month or so ago. The `Go` extension row would've existed in the
database before this switch.
I dumped and restored our database in prod and all seems to be fine now. I'm
now testing using the backup from prod on my local machine with the same RHEL
image running `glibc 2.17`.
coder=# select ext_name from extensions where ext_name = 'Go';
ext_name
----------
(0 rows)
coder=# SELECT bt_index_check('extensions_pkey', true);
ERROR: item order invariant violated for index "extensions_pkey"
DETAIL: Lower index tid=(1,2) (points to heap tid=(0,2)) higher index tid=(1,3) (points to heap tid=(2,1)) page lsn=1/FF9C13B8.
Looks like it found something. I checked out the contrib/pageinspect docs but
wasn't too sure what to run. Are incompatible libc versions causing btree
corruption something you consider a bug? If it's something you'd like to look
into further I can gladly send over the database files.
Thanks,
Colin
On Fri, Feb 7, 2020 at 7:08 PM Peter Geoghegan <pg@xxxxxxx> wrote:
On Fri, Feb 7, 2020 at 3:52 PM Colin Adler <colin@xxxxxxxxx> wrote:
> Seems to work now. My question is, is this something I should report to the
> maintainers?
I am one of the people that maintains the B-Tree code.
You didn't mention what version of Postgres you're using here. That
could be important. Please let us know. Mention the minor component of
the release version, too (i.e. say 12.2, not just 12).
> I took a snapshot of the data folder before the reindex in case it
> would be helpful. Is index corruption something that should be actively looked
> out for?
Yes -- look for corruption. If I had to guess, I'd say that this has
something to do with upgrading the operating system to use a
different, incompatible glibc. Or perhaps it has something to do with
streaming replication between machines with different glibc version.
You should try running contrib/amcheck, which should be able to isolate
index corruption, and give you a specific complaint. You may then be
able to inspect the exact index page with the problem using
contrib/pageinspect. Something like this ought to do it on Postgres
11 or 12:
CREATE EXTENSION IF NOT EXISTS amcheck
SELECT bt_index_check('my_index', true);
If that doesn't show any errors, then perhaps try this:
SELECT bt_index_parent_check('my_index', true);
If you're on Postgres 10, then you should leave out the second
argument, "true", since that version doesn't have the extra
heapallindexed check.
Let us know what you see.
--
Peter Geoghegan