Re: FSM corruption and standby servers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 31, 2016 at 9:55 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
"Hunley, Douglas" <douglas.hunley@xxxxxxxxxxx> writes:
> On Mon, Oct 31, 2016 at 10:38 AM, Tim Goodaire <tgoodaire@xxxxxxx> wrote:
>> I have a question regarding the FSM corruption bug that is fixed in
>> postgresql 9.5.5 (https://wiki.postgresql.org/wiki/Free_Space_Map_Problems).
>> If I don't find any corruption on a master database, is it still possible
>> that there is corruption on the standbys?

> It shouldn't be, iirc. FSMs are only ever created/updated by vacuum, which
> doesn't run on a slave until it is promoted to a master.

The problem is that the WAL data can be wrong in these cases, and since
the standbys only know what they were told in the WAL stream, their images
will be wrong even if the master is valid.

I would have thought that the referenced page is clear enough about
needing to check the standbys; do you think it isn't?

​I can ​see how the following is a bit loose for someone not super-familiar with WAL.

"A database crash-and-restart shortly after such an event can lead to corrupted FSMs. Also, standby servers will receive incorrect WAL data causing them to create corrupted FSMs locally."

I believe the "shortly" here is present because the crash must occur before the next checkpoint in order for the problem to appear on the master.  Given this constraint the secondary emphasis that standby servers receive seems mis-placed.  The most probable scenario - given the bug has manifested and one is running a standby - is a broken standby and a functioning master.​

"Standby servers are directly impacted by this bug and must be checked for corruption even if their master appears clean.  The master will only exhibit a problem if there is a crash-and-restart cycle shortly after (up until a checkpoint) the problem statement that causes the master to replay the just generated WAL."

It is not clear to what extent traditional backups (in the realm of using pg_basebackup) are affected...

David J.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux