Re: Watcher Issue

Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> · Thu, 23 Jan 2025 14:56:43 +0100 (CET)

Hi Dev, 

Is the image name 'csi-vol-945c6a66-9129' as in your first message or is it 'csi-vol-945c6a66' as in your last message? 
Because the command I sent to you grepped 'csi-vol-945c6a66-9129' and not 'csi-vol-945c6a66'... 

No clones or snapshots made by Velero on this image, right? 

If the image is still accessed by node 10.160.0.245, it should have a entry in path /sys/kernel/debug/ceph/$(ceph fsid).client12541259/... <----- client12541259 being the watchers session you get with 'rbd status' command without the '.' (dot) 
If so, then you could try to drain/cordon/reboot the K8s node and see if it clears up the watchers. 

Regards, 
Frédéric. 

----- Le 22 Jan 25, à 22:13, Devender Singh <devender@xxxxxxxxxx> a écrit : 

Hello Frederic 
Thanks for your email. 
We already verified those and tried killing them and upgrading the k8s and cis-plugin to but nothing helps. 
Below is the output.. did not report any volume.. 

# for pod in $(kubectl -n $namespace get pods | grep -E 'rbdplugin|nodeplugin' | grep -v provisioner | awk '{print $1}'); do echo $pod; kubectl exec -it -n $namespace $pod -c csi-rbdplugin -- rbd device list | grep $image ; done 
ceph-csi-rbd-nodeplugin-48vs2 
ceph-csi-rbd-nodeplugin-6zmjj 
ceph-csi-rbd-nodeplugin-7g6r5 
ceph-csi-rbd-nodeplugin-bp84x 
ceph-csi-rbd-nodeplugin-bt6hh 
ceph-csi-rbd-nodeplugin-d4tww 
ceph-csi-rbd-nodeplugin-rtb68 
ceph-csi-rbd-nodeplugin-t87db 

But still error ; 
# date;kubectl -n elastic describe pod/es-es-default-3 |grep -i warning 
Wed 22 Jan 2025 01:12:09 PM PST 
Warning FailedMount 2s (x13 over 21m) kubelet MountVolume.MountDevice failed for volume "pvc-3a2048f1" : rpc error: code = Internal desc = rbd image k8s-rgnl-disks/csi-vol-945c6a66 is still being used 

Regards 
Dev 

BQ_BEGIN

On Jan 21, 2025, at 11:50 PM, Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> wrote: 

Hi Dev, 

Can you run the below command to check if this image is still considered as mapped by any ceph-csi nodeplugins? 

$ namespace=ceph-csi-rbd 
$ image=csi-vol-945c6a66-9129 
$ for pod in $(kubectl -n $namespace get pods | grep -E 'rbdplugin|nodeplugin' | grep -v provisioner | awk '{print $1}'); do echo $pod; kubectl exec -it -n $namespace $pod -c csi-rbdplugin -- rbd device list | grep $image ; done 

If it pops out in the output, get into the csi-rbdplugin container of the nodeplugin pod that listed the image and unmount/unmap it: 

$ kubectl -n $namespace exec -ti ceph-csi-rbd-nodeplugin-xxxxx -c csi-rbdplugin -- sh <---- please adjust nodepluding pod name here 
sh-4.4# 
sh-4.4# rbd device list 
id pool namespace image snap device 
0 k8s-rgnl-disks csi-vol-945c6a66-9129 - /dev/rbd0 
sh-4.4# umount /dev/rbd/k8s-rgnl-disks/csi-vol-945c6a66-9129 
sh-4.4# rbd unmap /dev/rbd/k8s-rgnl-disks/csi-vol-945c6a66-9129 
sh-4.4# rbd device list 
sh-4.4# 

Hope there's no typo. 

Regards, 
Frédéric. 

----- Le 21 Jan 25, à 23:33, Devender Singh [ mailto:devender@xxxxxxxxxx | devender@xxxxxxxxxx ] a écrit : 

BQ_BEGIN
Hello Eugen 

Thanks for your reply. 
I have the image available and it’s not under trash. 

When scaling a pod to different node using statefulset, pod gives mount issue. 

I was looking for a command if we can kill the [ http://client.id/ | client.id ] < [ https://www.google.com/url?q=http://client.id/&source=gmail-imap&ust=1738137024000000&usg=AOvVaw10QRl9S7YS6pPaI6JKmdyy | https://www.google.com/url?q=http://client.id/&source=gmail-imap&ust=1738137024000000&usg=AOvVaw10QRl9S7YS6pPaI6JKmdyy ] > 
from ceph. CEPH must have a command to kill its clients etc… 
Don’t understand why pod complaining about same volume name about a k8s host 
using it. Whereas its nowhere.. Not sure what to do in this situation.. 
We tried upgrading csi, k8s cluster. Renamed image and blocklisted the host. And 
renamed back image to its original image but still red status showing same 
client host. 

Regards 
Dev 

BQ_BEGIN
On Jan 21, 2025, at 12:16 PM, Eugen Block <eblock@xxxxxx> wrote: 

Hi, 

have you checked if the image is in the trash? 

rbd -p {pool} trash ls 

You can try to restore the image if there is one, then blocklist the client to 
release the watcher, then delete the image again. 

I have to do that from time to time on a customer’s openstack cluster. 

Zitat von Devender Singh <devender@xxxxxxxxxx>: 

BQ_BEGIN
Hello 

Seeking some help if I can clean the client mounting my volume? 

rbd status pool/image 

Watchers: 
watcher=10.160.0.245:0/2076588905 client.12541259 cookie=140446370329088 

Issue: pod is failing in init- state. 
Events: 
Type Reason Age From Message 
---- ------ ---- ---- ------- 
Warning FailedMount 96s (x508 over 24h) kubelet MountVolume.MountDevice 
failed for volume "pvc-3a2048f1" : rpc error: code = Internal desc = rbd image 
k8s-rgnl-disks/csi-vol-945c6a66-9129 is still being used 

It shows above client, but there is no such volume… 

Another similar issue… on dashboard… 

CephNodeDiskspaceWarning 
Mountpoint /mnt/dst-volume on sea-prod-host01 will be full in less than 5 days 
based on the 48 hour trailing fill rate. 

Whereas nothing is mounted, I mapped one image yesterday using red map and then 
unmapped and unmounted everything but it been more than 12hours now, still 
showing the message.. 

CEPH version: 18.2.4 

Regards 
Dev 

_______________________________________________ 
ceph-users mailing list -- ceph-users@xxxxxxx 
To unsubscribe send an email to ceph-users-leave@xxxxxxx 

_______________________________________________ 
ceph-users mailing list -- ceph-users@xxxxxxx 
To unsubscribe send an email to ceph-users-leave@xxxxxxx 

BQ_END

_______________________________________________ 
ceph-users mailing list -- [ mailto:ceph-users@xxxxxxx | ceph-users@xxxxxxx ] 
To unsubscribe send an email to [ mailto:ceph-users-leave@xxxxxxx | ceph-users-leave@xxxxxxx ] 

BQ_END

BQ_END

BQ_END

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx