On 2019-Jul-17, Tom Lane wrote: > Alvaro Herrera <alvherre@xxxxxxxxxxxxxxx> writes: > > On 2019-Jul-17, Peter Geoghegan wrote: > >> Maybe nbtree VACUUM should do something more aggressive than give up > >> when there is a "failed to re-find parent key" or similar condition. > >> Perhaps it would make more sense to make the index inactive (for some > >> value of "inactive") instead of just complaining. That might be the > >> least worst option, all things considered. > > > Maybe we can mark an index as unvacuumable in some way? As far as I > > understand, all queries using that index work, as do index updates; it's > > just vacuuming that fails. If we mark the index as unvacuumable, then > > vacuum just skips it (and does not run phase 3 for that table), and > > things can proceed; the table's age can still be advanced. Obviously > > it'll result in more bloat than in normal condition, but it shouldn't > > cause the whole cluster to go down. > > If an index is corrupt enough to break vacuum, I think it takes a rather > large leap of faith to believe that it's not going to cause problems for > inserts or searches. Maybe, but it's what happened in the reported case. (Note Aaron was careful to do the index replacement concurrently -- he wouldn't have done that if the table wasn't in active use.) > I'd go with just marking the index broken and > insisting that it be REINDEX'd before we touch it again. This might make things worse operationally, though. If searches aren't failing but vacuum is, we'd break a production system that currently works. > (a) once the transaction's failed, you can't go making catalog updates; Maybe we can defer the actual update to some other transaction -- say register an autovacuum work-item, which can be executed separately. > (b) even when you know the transaction's failed, blaming it on a > particular index seems a bit chancy; Well, vacuum knows what index is being processed. Maybe you're thinking that autovac can get an out-of-memory condition or something like that; perhaps we can limit the above only when an ERRCODE_DATA_CORRUPTED condition is reported (and make sure all such conditions do that. As far as I remember we have a patch for this particular error to be reported as such.) > (c) automatically disabling constraint indexes seems less than desirable. Disabling them for writes, yeah. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services