FYI,
Just did same test with e2fsprogs 1.45.5 (from buster backports) and
kernel 5.4.13-1~bpo10+1.
And having exactly the same issue.
The VM needs a manual fsck after storage outage.
Don't know if its useful to test with 5.5 or 5.6?
But it seems like the issue still exists.
Thanks
Jean-Louis
On 20/02/2020 17:14, Jean-Louis Dupond wrote:
On 20/02/2020 16:50, Theodore Y. Ts'o wrote:
On Thu, Feb 20, 2020 at 10:08:44AM +0100, Jean-Louis Dupond wrote:
dumpe2fs -> see attachment
Looking at the dumpe2fs output, it's interesting that it was "clean
with errors", without any error information being logged in the
superblock. What version of the kernel are you using? I'm guessing
it's a fairly old one?
Debian 10 (Buster), with kernel 4.19.67-2+deb10u1
Fsck:
# e2fsck -fy /dev/mapper/vg01-root
e2fsck 1.44.5 (15-Dec-2018)
And that's a old version of e2fsck as well. Is this some kind of
stable/enterprise linux distro?
Debian 10 indeed.
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found. Fix?
yes
Inode 165708 was part of the orphaned inode list. FIXED.
OK, this and the rest looks like it's relating to a file truncation or
deletion at the time of the disconnection.
> > > On KVM for example there is a unlimited timeout (afaik) until
the
storage is
back, and the VM just continues running after storage recovery.
Well, you can adjust the SCSI timeout, if you want to give that a
try....
It has some other disadvantages? Or is it quite safe to increment
the SCSI
timeout?
It should be pretty safe.
Can you reliably reproduce the problem by disconnecting the machine
from the SAN?
Yep, can be reproduced by killing the connection to the SAN while the
VM is running, and then after the scsi timeout passed, re-enabled the
SAN connection.
Then reset the machine, and then you need to run an fsck to have it
back online.
- Ted