On Fri, May 5, 2023 at 6:11 AM Evgeny Morozov <postgresql3@xxxxxxxxxxxxxxxxx> wrote: > Meanwhile, what do I do with the existing server, though? Just try to > drop the problematic DBs again manually? That earlier link to a FreeBSD thread is surely about bleeding edge new ZFS stuff that was briefly broken then fixed, being discovered by people running code imported from OpenZFS master branch into FreeBSD main branch (ie it's not exactly released, not following the details but I think it might soon be 2.2?), but you're talking about an LTS Ubuntu release from 2018, which shipped "ZFS on Linux" version 0.7.5, unless you installed a newer version somehow? So it doesn't sound like it could be related. That doesn't mean it couldn't be a different ZFS bug though. While looking into file system corruption issues that had similar symptoms on some other file system (which turned out to be a bug in btrfs) I did bump into a claim that ZFS could product unexpected zeroes in some mmap coherency scenario, OpenZFS issue #14548. I don't immediately see how PostgreSQL could get tangled up with that problem though, as we aren't doing that... It seems quite interesting that it's always pg_class_oid_index block 0 (the btree meta-page), which feels more like a PostgreSQL bug, unless the access pattern of that particular file/block is somehow highly unusual compared to every other block and tickling bugs elsewhere in the stack. How does that file look, in terms of size, and how many pages in it are zero? I think it should be called base/5/2662. Oooh, but this is a relation that goes through RelationMapOidToFilenumber. What does select pg_relation_filepath('pg_class_oid_index') show in the corrupted database, base/5/2662 or something else? Now *that* is a piece of logic that changed in PostgreSQL 15. It changed from sector-based atomicity assumptions to a directory entry swizzling trick, in commit d8cd0c6c95c0120168df93aae095df4e0682a08a. Hmm.