As the mail seems to have been trashed somewhere, I'll retry :)
On 24/01/2020 21:37, Theodore Y. Ts'o wrote:
On Fri, Jan 24, 2020 at 11:57:10AM +0100, Jean-Louis Dupond wrote:
There was a short disruption of the SAN, which caused it to be
for 20-25 minutes for the ESXi.
20-25 minutes is "short"? I guess it depends on your definition /
POV. :-)
Well more downtime was caused to recover (due to manual fsck) then the
time the storage was down :)
What worries me is that almost all of the VM's (out of 500) were
showing the
same error.
So that's a bit surprising...
Indeed, that's were I thought, something went wrong here!
I've tried to simulate it, and were able to simulate the same error
when we let the san recover BEFORE VM is shutdown.
If I poweroff the VM and then recover the SAN, it does an automatic
fsck without problems.
So it really seems it breaks when the VM can write again to the SAN.
And even some (+-10) were completely corrupt.
What do you mean by "completely corrupt"? Can you send an e2fsck
transcript of file systems that were "completely corrupt"?
Well it was moving a tons of files to lost+found etc. So that was
really broken.
I'll see if I can recover some backup of one in broken state.
Anyway this was only a very small percentage, so worries me less then
the rest :)
Is there for example a chance that the filesystem gets corrupted the
the SAN storage was back accessible?
Hmm... the one possibility I can think of off the top of my head is
that in order to mark the file system as containing an error, we need
to write to the superblock. The head of the linked list of orphan
inodes is also in the superblock. If that had gotten modified in the
intervening 20-25 minutes, it's possible that this would result in
orphaned inodes not on the linked list, causing that error.
It doesn't explain the more severe cases of corruption, though.
If fixing that would have left us with only 10 corrupt disks instead
of 500, would be a big win :)
I also have some snapshot available of a corrupted disk if some
debugging info is required.
Before e2fsck was run? Can you send me a copy of the output of
dumpe2fs run on that disk, and then transcript of e2fsck -fy run on a
copy of that snapshot?
dumpe2fs -> see attachment
# e2fsck -fy /dev/mapper/vg01-root
e2fsck 1.44.5 (15-Dec-2018)
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found. Fix? yes
Inode 165708 was part of the orphaned inode list. FIXED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -(863328--863355)
Fix? yes
Free blocks count wrong for group #26 (3485, counted=3513).
Fix? yes
Free blocks count wrong (1151169, counted=1151144).
Fix? yes
Inode bitmap differences: -4401 -165708
Fix? yes
Free inodes count wrong for group #0 (2489, counted=2490).
Fix? yes
Free inodes count wrong for group #20 (1298, counted=1299).
Fix? yes
Free inodes count wrong (395115, counted=395098).
Fix? yes
/dev/mapper/vg01-root: ***** FILE SYSTEM WAS MODIFIED *****
/dev/mapper/vg01-root: 113942/509040 files (0.2% non-contiguous),
882520/2033664 blocks
It would be great to gather some feedback on how to improve the
(next to of course have no SAN outage :)).
Something that you could consider is setting up your system to trigger
a panic/reboot on a hung task timeout, or when ext4 detects an error
(see the man page of tune2fs and mke2fs and the -e option for those
There are tradeoffs with this, but if you've lost the SAN for 15-30
minutes, the file systems are going to need to be checked anyway, and
the machine will certainly not be serving. So forcing a reboot might
be the best thing to do.
Going to look into that! Thanks for the info.
On KVM for example there is a unlimited timeout (afaik) until the
storage is
back, and the VM just continues running after storage recovery.
Well, you can adjust the SCSI timeout, if you want to give that a
It has some other disadvantages? Or is it quite safe to increment the
SCSI timeout?
- Ted