Hi, I've had a similar issue but outside of ceph-csi. Running a CRUD test to (create, map, write, read, unmap, and delete) an RBD in a short amount of time can result in it having a stuck watcher. I assume it is from mapping and unmapping very quickly (under 30 sec). What I have found is if you restart the primary osd for the header object, the watcher will go away assuming nothing is really watching it. > rbd info -p pool-name rbd-name > # get the id from the output. ex: 1234 > ceph osd map pool-name rbd_header.1234 > # get the primary under acting pNNN ex 43 > ceph osd down 43 > This is the tracker <https://tracker.ceph.com/issues/58120> I'm watching and the backport says it should be fixed in 18.2.5 Hope this helps, Reid On Wed, Jan 22, 2025 at 4:14 PM Devender Singh <devender@xxxxxxxxxx> wrote: > Hello Frederic > > Thanks for your email. > We already verified those and tried killing them and upgrading the k8s and > cis-plugin to but nothing helps. > Below is the output.. did not report any volume.. > > # for pod in $(kubectl -n $namespace get pods | grep -E > 'rbdplugin|nodeplugin' | grep -v provisioner | awk '{print $1}'); do echo > $pod; kubectl exec -it -n $namespace $pod -c csi-rbdplugin -- rbd device > list | grep $image ; done > ceph-csi-rbd-nodeplugin-48vs2 > ceph-csi-rbd-nodeplugin-6zmjj > ceph-csi-rbd-nodeplugin-7g6r5 > ceph-csi-rbd-nodeplugin-bp84x > ceph-csi-rbd-nodeplugin-bt6hh > ceph-csi-rbd-nodeplugin-d4tww > ceph-csi-rbd-nodeplugin-rtb68 > ceph-csi-rbd-nodeplugin-t87db > > But still error ; > # date;kubectl -n elastic describe pod/es-es-default-3 |grep -i warning > Wed 22 Jan 2025 01:12:09 PM PST > Warning FailedMount 2s (x13 over 21m) kubelet > MountVolume.MountDevice failed for volume "pvc-3a2048f1" : rpc error: code > = Internal desc = rbd image k8s-rgnl-disks/csi-vol-945c6a66 is still being > used > > > Regards > Dev > > > On Jan 21, 2025, at 11:50 PM, Frédéric Nass < > frederic.nass@xxxxxxxxxxxxxxxx> wrote: > > > > Hi Dev, > > > > Can you run the below command to check if this image is still considered > as mapped by any ceph-csi nodeplugins? > > > > $ namespace=ceph-csi-rbd > > $ image=csi-vol-945c6a66-9129 > > $ for pod in $(kubectl -n $namespace get pods | grep -E > 'rbdplugin|nodeplugin' | grep -v provisioner | awk '{print $1}'); do echo > $pod; kubectl exec -it -n $namespace $pod -c csi-rbdplugin -- rbd device > list | grep $image ; done > > > > If it pops out in the output, get into the csi-rbdplugin container of > the nodeplugin pod that listed the image and unmount/unmap it: > > > > $ kubectl -n $namespace exec -ti ceph-csi-rbd-nodeplugin-xxxxx -c > csi-rbdplugin -- sh <---- please adjust nodepluding pod name here > > sh-4.4# > > sh-4.4# rbd device list > > id pool namespace image snap device > > 0 k8s-rgnl-disks csi-vol-945c6a66-9129 - /dev/rbd0 > > sh-4.4# umount /dev/rbd/k8s-rgnl-disks/csi-vol-945c6a66-9129 > > sh-4.4# rbd unmap /dev/rbd/k8s-rgnl-disks/csi-vol-945c6a66-9129 > > sh-4.4# rbd device list > > sh-4.4# > > > > Hope there's no typo. > > > > Regards, > > Frédéric. > > > > ----- Le 21 Jan 25, à 23:33, Devender Singh devender@xxxxxxxxxx <mailto: > devender@xxxxxxxxxx> a écrit : > > > >> Hello Eugen > >> > >> Thanks for your reply. > >> I have the image available and it’s not under trash. > >> > >> When scaling a pod to different node using statefulset, pod gives mount > issue. > >> > >> I was looking for a command if we can kill the client.id < > http://client.id/> < > https://www.google.com/url?q=http://client.id/&source=gmail-imap&ust=1738137024000000&usg=AOvVaw10QRl9S7YS6pPaI6JKmdyy > > > >> from ceph. CEPH must have a command to kill its clients etc… > >> Don’t understand why pod complaining about same volume name about a k8s > host > >> using it. Whereas its nowhere.. Not sure what to do in this situation.. > >> We tried upgrading csi, k8s cluster. Renamed image and blocklisted the > host. And > >> renamed back image to its original image but still red status showing > same > >> client host. > >> > >> > >> Regards > >> Dev > >> > >>> On Jan 21, 2025, at 12:16 PM, Eugen Block <eblock@xxxxxx> wrote: > >>> > >>> Hi, > >>> > >>> have you checked if the image is in the trash? > >>> > >>> rbd -p {pool} trash ls > >>> > >>> You can try to restore the image if there is one, then blocklist the > client to > >>> release the watcher, then delete the image again. > >>> > >>> I have to do that from time to time on a customer’s openstack cluster. > >>> > >>> Zitat von Devender Singh <devender@xxxxxxxxxx>: > >>> > >>>> Hello > >>>> > >>>> Seeking some help if I can clean the client mounting my volume? > >>>> > >>>> rbd status pool/image > >>>> > >>>> Watchers: > >>>> watcher=10.160.0.245:0/2076588905 client.12541259 > cookie=140446370329088 > >>>> > >>>> Issue: pod is failing in init- state. > >>>> Events: > >>>> Type Reason Age From Message > >>>> ---- ------ ---- ---- ------- > >>>> Warning FailedMount 96s (x508 over 24h) kubelet > MountVolume.MountDevice > >>>> failed for volume "pvc-3a2048f1" : rpc error: code = Internal desc = > rbd image > >>>> k8s-rgnl-disks/csi-vol-945c6a66-9129 is still being used > >>>> > >>>> It shows above client, but there is no such volume… > >>>> > >>>> Another similar issue… on dashboard… > >>>> > >>>> CephNodeDiskspaceWarning > >>>> Mountpoint /mnt/dst-volume on sea-prod-host01 will be full in less > than 5 days > >>>> based on the 48 hour trailing fill rate. > >>>> > >>>> Whereas nothing is mounted, I mapped one image yesterday using red > map and then > >>>> unmapped and unmounted everything but it been more than 12hours now, > still > >>>> showing the message.. > >>>> > >>>> > >>>> CEPH version: 18.2.4 > >>>> > >>>> Regards > >>>> Dev > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> ceph-users mailing list -- ceph-users@xxxxxxx > >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>> > >>> > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users@xxxxxxx > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx <mailto: > ceph-users@xxxxxxx> > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto: > ceph-users-leave@xxxxxxx> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx