Failing to mount PVCs

Fatih Ertinaz <fertinaz@xxxxxxxxx> · Wed, 29 Sep 2021 10:25:08 -0400

Hi,

We recently started to observe issues similar to the following in our
cluster environment:

Warning FailedMount 31s (x8 over 97s) kubelet, ${NODEIP}  MountVolume.SetUp
failed for volume "${PVCNAME}" : mount command failed, status: Failure,
reason: failed to mount volume /dev/rbd2 [ext4] to /var/lib/kubelet/plugins/
rook.io/rook-ceph/mounts/${PVCNAME}, error 'fsck' found errors on device
/dev/rbd2 but could not correct them: fsck from util-linux 2.23.2
/dev/rbd2 contains a file system with errors, check forced.
/dev/rbd2: Inode 2884174 has an invalid extent node (blk 11567229, lblk 0)
/dev/rbd2: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

The exact error (inode has an invalid extent node) may differ, two other
ones I've seen are "Multiply-claimed block(s) in inode" and "Unattached
inode".

This is a private cloud environment with kubernetes (1.13) and ceph. As far
as I know, worker nodes haven't been rebooted for the past 6 months.
However, I saw some oom killer messages in the logs.

In general, has anyone else seen similar errors before and any ideas about
what might be the root cause? There is a workaround I applied which seemed
to resolve the issue temporarily (rbd map to a new device and run fsck on
it), but I'd very much like to prevent these from happening in the first
place.

Thank you,

Fatih Ertinaz
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx