Hi Reid, Yep, it should definitely help if the client node (kernel) is not accessing the image anymore. Thanks for sharing the tracker. It's good to know that a fix is on the way. Cheers, Frédéric. ----- Le 23 Jan 25, à 15:02, Reid Guyett reid.guyett@xxxxxxxxx a écrit : > Hi, > > I've had a similar issue but outside of ceph-csi. Running a CRUD test to > (create, map, write, read, unmap, and delete) an RBD in a short amount of > time can result in it having a stuck watcher. I assume it is from mapping > and unmapping very quickly (under 30 sec). > What I have found is if you restart the primary osd for the header object, > the watcher will go away assuming nothing is really watching it. > >> rbd info -p pool-name rbd-name >> # get the id from the output. ex: 1234 >> ceph osd map pool-name rbd_header.1234 >> # get the primary under acting pNNN ex 43 >> ceph osd down 43 >> > > This is the tracker <https://tracker.ceph.com/issues/58120> I'm watching > and the backport says it should be fixed in 18.2.5 > > Hope this helps, > Reid > > > On Wed, Jan 22, 2025 at 4:14 PM Devender Singh <devender@xxxxxxxxxx> wrote: > >> Hello Frederic >> >> Thanks for your email. >> We already verified those and tried killing them and upgrading the k8s and >> cis-plugin to but nothing helps. >> Below is the output.. did not report any volume.. >> >> # for pod in $(kubectl -n $namespace get pods | grep -E >> 'rbdplugin|nodeplugin' | grep -v provisioner | awk '{print $1}'); do echo >> $pod; kubectl exec -it -n $namespace $pod -c csi-rbdplugin -- rbd device >> list | grep $image ; done >> ceph-csi-rbd-nodeplugin-48vs2 >> ceph-csi-rbd-nodeplugin-6zmjj >> ceph-csi-rbd-nodeplugin-7g6r5 >> ceph-csi-rbd-nodeplugin-bp84x >> ceph-csi-rbd-nodeplugin-bt6hh >> ceph-csi-rbd-nodeplugin-d4tww >> ceph-csi-rbd-nodeplugin-rtb68 >> ceph-csi-rbd-nodeplugin-t87db >> >> But still error ; >> # date;kubectl -n elastic describe pod/es-es-default-3 |grep -i warning >> Wed 22 Jan 2025 01:12:09 PM PST >> Warning FailedMount 2s (x13 over 21m) kubelet >> MountVolume.MountDevice failed for volume "pvc-3a2048f1" : rpc error: code >> = Internal desc = rbd image k8s-rgnl-disks/csi-vol-945c6a66 is still being >> used >> >> >> Regards >> Dev >> >> > On Jan 21, 2025, at 11:50 PM, Frédéric Nass < >> frederic.nass@xxxxxxxxxxxxxxxx> wrote: >> > >> > Hi Dev, >> > >> > Can you run the below command to check if this image is still considered >> as mapped by any ceph-csi nodeplugins? >> > >> > $ namespace=ceph-csi-rbd >> > $ image=csi-vol-945c6a66-9129 >> > $ for pod in $(kubectl -n $namespace get pods | grep -E >> 'rbdplugin|nodeplugin' | grep -v provisioner | awk '{print $1}'); do echo >> $pod; kubectl exec -it -n $namespace $pod -c csi-rbdplugin -- rbd device >> list | grep $image ; done >> > >> > If it pops out in the output, get into the csi-rbdplugin container of >> the nodeplugin pod that listed the image and unmount/unmap it: >> > >> > $ kubectl -n $namespace exec -ti ceph-csi-rbd-nodeplugin-xxxxx -c >> csi-rbdplugin -- sh <---- please adjust nodepluding pod name here >> > sh-4.4# >> > sh-4.4# rbd device list >> > id pool namespace image snap device >> > 0 k8s-rgnl-disks csi-vol-945c6a66-9129 - /dev/rbd0 >> > sh-4.4# umount /dev/rbd/k8s-rgnl-disks/csi-vol-945c6a66-9129 >> > sh-4.4# rbd unmap /dev/rbd/k8s-rgnl-disks/csi-vol-945c6a66-9129 >> > sh-4.4# rbd device list >> > sh-4.4# >> > >> > Hope there's no typo. >> > >> > Regards, >> > Frédéric. >> > >> > ----- Le 21 Jan 25, à 23:33, Devender Singh devender@xxxxxxxxxx <mailto: >> devender@xxxxxxxxxx> a écrit : >> > >> >> Hello Eugen >> >> >> >> Thanks for your reply. >> >> I have the image available and it’s not under trash. >> >> >> >> When scaling a pod to different node using statefulset, pod gives mount >> issue. >> >> >> >> I was looking for a command if we can kill the client.id < >> http://client.id/> < >> https://www.google.com/url?q=http://client.id/&source=gmail-imap&ust=1738137024000000&usg=AOvVaw10QRl9S7YS6pPaI6JKmdyy >> > >> >> from ceph. CEPH must have a command to kill its clients etc… >> >> Don’t understand why pod complaining about same volume name about a k8s >> host >> >> using it. Whereas its nowhere.. Not sure what to do in this situation.. >> >> We tried upgrading csi, k8s cluster. Renamed image and blocklisted the >> host. And >> >> renamed back image to its original image but still red status showing >> same >> >> client host. >> >> >> >> >> >> Regards >> >> Dev >> >> >> >>> On Jan 21, 2025, at 12:16 PM, Eugen Block <eblock@xxxxxx> wrote: >> >>> >> >>> Hi, >> >>> >> >>> have you checked if the image is in the trash? >> >>> >> >>> rbd -p {pool} trash ls >> >>> >> >>> You can try to restore the image if there is one, then blocklist the >> client to >> >>> release the watcher, then delete the image again. >> >>> >> >>> I have to do that from time to time on a customer’s openstack cluster. >> >>> >> >>> Zitat von Devender Singh <devender@xxxxxxxxxx>: >> >>> >> >>>> Hello >> >>>> >> >>>> Seeking some help if I can clean the client mounting my volume? >> >>>> >> >>>> rbd status pool/image >> >>>> >> >>>> Watchers: >> >>>> watcher=10.160.0.245:0/2076588905 client.12541259 >> cookie=140446370329088 >> >>>> >> >>>> Issue: pod is failing in init- state. >> >>>> Events: >> >>>> Type Reason Age From Message >> >>>> ---- ------ ---- ---- ------- >> >>>> Warning FailedMount 96s (x508 over 24h) kubelet >> MountVolume.MountDevice >> >>>> failed for volume "pvc-3a2048f1" : rpc error: code = Internal desc = >> rbd image >> >>>> k8s-rgnl-disks/csi-vol-945c6a66-9129 is still being used >> >>>> >> >>>> It shows above client, but there is no such volume… >> >>>> >> >>>> Another similar issue… on dashboard… >> >>>> >> >>>> CephNodeDiskspaceWarning >> >>>> Mountpoint /mnt/dst-volume on sea-prod-host01 will be full in less >> than 5 days >> >>>> based on the 48 hour trailing fill rate. >> >>>> >> >>>> Whereas nothing is mounted, I mapped one image yesterday using red >> map and then >> >>>> unmapped and unmounted everything but it been more than 12hours now, >> still >> >>>> showing the message.. >> >>>> >> >>>> >> >>>> CEPH version: 18.2.4 >> >>>> >> >>>> Regards >> >>>> Dev >> >>>> >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> ceph-users mailing list -- ceph-users@xxxxxxx >> >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >>> >> >>> >> >>> _______________________________________________ >> >>> ceph-users mailing list -- ceph-users@xxxxxxx >> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> >> >> _______________________________________________ >> >> ceph-users mailing list -- ceph-users@xxxxxxx <mailto: >> ceph-users@xxxxxxx> >> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto: >> ceph-users-leave@xxxxxxx> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx