Re: Failing to mount PVCs

Eugen Block <eblock@xxxxxx> · Fri, 01 Oct 2021 10:19:07 +0000

Hi,

I'm not entirely sure if this really is the same issue here. One of  
our customers also works with k8s in openstack and I saw similar  
messages. We never investigated it, I don't know if the customer did,  
but one thing they encountered was that k8s didn't properly clean up  
detached/deleted volumes before reattaching them or attaching new ones.
This sometimes results in the same volume apparently attached multiple  
times to the same VM. According to the customer this was due to the  
cinder driver in k8s, but as I said, I'm not sure. But this message  
"Multiply-claimed block(s) in inode" reminded me of that. The oom  
killers are also familiar in this environment.

The current workaround is to properly clean up attachments before  
reattaching again. And to use flavors with more resources to prevent  
oom killers.

This doesn't help much, but at least you know that you're not alone. ;-)

Zitat von Fatih Ertinaz <fertinaz@xxxxxxxxx>:

Hi,

We recently started to observe issues similar to the following in our
cluster environment:

Warning FailedMount 31s (x8 over 97s) kubelet, ${NODEIP}  MountVolume.SetUp
failed for volume "${PVCNAME}" : mount command failed, status: Failure,
reason: failed to mount volume /dev/rbd2 [ext4] to /var/lib/kubelet/plugins/
rook.io/rook-ceph/mounts/${PVCNAME}, error 'fsck' found errors on device
/dev/rbd2 but could not correct them: fsck from util-linux 2.23.2
/dev/rbd2 contains a file system with errors, check forced.
/dev/rbd2: Inode 2884174 has an invalid extent node (blk 11567229, lblk 0)
/dev/rbd2: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

The exact error (inode has an invalid extent node) may differ, two other
ones I've seen are "Multiply-claimed block(s) in inode" and "Unattached
inode".

This is a private cloud environment with kubernetes (1.13) and ceph. As far
as I know, worker nodes haven't been rebooted for the past 6 months.
However, I saw some oom killer messages in the logs.

In general, has anyone else seen similar errors before and any ideas about
what might be the root cause? There is a workaround I applied which seemed
to resolve the issue temporarily (rbd map to a new device and run fsck on
it), but I'd very much like to prevent these from happening in the first
place.

Thank you,

Fatih Ertinaz
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx