On Wed, Jul 17, 2019 at 11:43 AM Alvaro Herrera <alvherre@xxxxxxxxxxxxxxx> wrote: > This might make things worse operationally, though. If searches aren't > failing but vacuum is, we'd break a production system that currently > works. If searches aren't failing and VACUUM works, then that's probably down to dumb luck. The user's luck could change at any time (actually, it's quite possible that the index is already giving wrong answers without anybody realizing). That's not always true, of course -- you could have an OOM condition in VACUUM, where it really does make sense to retry. But it should be true for the category of errors where we behave more aggressively than just giving up, such as "failed to re-find parent key" error Aaron noticed. > Well, vacuum knows what index is being processed. Maybe you're thinking > that autovac can get an out-of-memory condition or something like that; > perhaps we can limit the above only when an ERRCODE_DATA_CORRUPTED > condition is reported (and make sure all such conditions do that. As > far as I remember we have a patch for this particular error to be > reported as such.) I don't think that it would be that hard to identify errors that nbtree VACUUM could throw that clearly indicate corruption, without any hope of the problem self-correcting without the DBA running a REINDEX. There will be a small amount of gray area, perhaps, but probably not enough to matter. > > (c) automatically disabling constraint indexes seems less than desirable. > > Disabling them for writes, yeah. I think that it's fair to say that all bets are off once you see the "failed to re-find parent key" error, or any other such error that indicates corruption. Admittedly it isn't 100% clear that disabling constraint enforcement to unblock autovacuum for the whole cluster is better than any available alternative; it's really hard to reason about things when we already know that the database has corruption. I think that it's okay that almost anything can break when somebody creates an index on a non-immutable expression (including VACUUM), provided that all the problems only affect the table with the broken index. OTOH, taking down the entire Postgres cluster as an indirect consequence of one person's ill-considered CREATE INDEX really sucks. That distinction seems important to me. -- Peter Geoghegan