Re: Watcher Issue

Devender Singh <devender@xxxxxxxxxx> · Wed, 22 Jan 2025 13:13:19 -0800

Hello Frederic

Thanks for your email. 
We already verified those and tried killing them and upgrading the k8s and cis-plugin to but nothing helps. 
Below is the output.. did not report any volume.. 

# for pod in $(kubectl -n $namespace get pods | grep -E 'rbdplugin|nodeplugin' | grep -v provisioner | awk '{print $1}'); do echo $pod; kubectl exec -it -n $namespace $pod -c csi-rbdplugin -- rbd device list | grep $image ; done
ceph-csi-rbd-nodeplugin-48vs2
ceph-csi-rbd-nodeplugin-6zmjj
ceph-csi-rbd-nodeplugin-7g6r5
ceph-csi-rbd-nodeplugin-bp84x
ceph-csi-rbd-nodeplugin-bt6hh
ceph-csi-rbd-nodeplugin-d4tww
ceph-csi-rbd-nodeplugin-rtb68
ceph-csi-rbd-nodeplugin-t87db

But still error ; 
# date;kubectl -n elastic describe pod/es-es-default-3 |grep -i warning
Wed 22 Jan 2025 01:12:09 PM PST
  Warning  FailedMount  2s (x13 over 21m)  kubelet            MountVolume.MountDevice failed for volume "pvc-3a2048f1" : rpc error: code = Internal desc = rbd image k8s-rgnl-disks/csi-vol-945c6a66 is still being used

Regards
Dev

> On Jan 21, 2025, at 11:50 PM, Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> wrote:
> 
> Hi Dev,
> 
> Can you run the below command to check if this image is still considered as mapped by any ceph-csi nodeplugins? 
> 
> $ namespace=ceph-csi-rbd
> $ image=csi-vol-945c6a66-9129
> $ for pod in $(kubectl -n $namespace get pods | grep -E 'rbdplugin|nodeplugin' | grep -v provisioner | awk '{print $1}'); do echo $pod; kubectl exec -it -n $namespace $pod -c csi-rbdplugin -- rbd device list | grep $image ; done
> 
> If it pops out in the output, get into the csi-rbdplugin container of the nodeplugin pod that listed the image and unmount/unmap it:
> 
> $ kubectl -n $namespace exec -ti ceph-csi-rbd-nodeplugin-xxxxx -c csi-rbdplugin -- sh           <---- please adjust nodepluding pod name here
> sh-4.4#
> sh-4.4# rbd device list
> id  pool           namespace  image                  snap  device
> 0   k8s-rgnl-disks            csi-vol-945c6a66-9129  -     /dev/rbd0
> sh-4.4# umount /dev/rbd/k8s-rgnl-disks/csi-vol-945c6a66-9129
> sh-4.4# rbd unmap /dev/rbd/k8s-rgnl-disks/csi-vol-945c6a66-9129
> sh-4.4# rbd device list
> sh-4.4#
> 
> Hope there's no typo.
> 
> Regards,
> Frédéric.
> 
> ----- Le 21 Jan 25, à 23:33, Devender Singh devender@xxxxxxxxxx <mailto:devender@xxxxxxxxxx> a écrit :
> 
>> Hello Eugen
>> 
>> Thanks for your reply.
>> I have the image available and it’s not under trash.
>> 
>> When scaling a pod to different node using statefulset, pod gives mount issue.
>> 
>> I was looking for a command if we can kill the client.id <http://client.id/> <https://www.google.com/url?q=http://client.id/&source=gmail-imap&ust=1738137024000000&usg=AOvVaw10QRl9S7YS6pPaI6JKmdyy>
>> from ceph. CEPH must have a command to kill its clients etc…
>> Don’t understand why pod complaining about same volume name about a k8s host
>> using it. Whereas its nowhere.. Not sure what to do in this situation..
>> We tried upgrading csi, k8s cluster. Renamed image and blocklisted the host. And
>> renamed back image to its original image but still red status showing same
>> client host.
>> 
>> 
>> Regards
>> Dev
>> 
>>> On Jan 21, 2025, at 12:16 PM, Eugen Block <eblock@xxxxxx> wrote:
>>> 
>>> Hi,
>>> 
>>> have you checked if the image is in the trash?
>>> 
>>> rbd -p {pool} trash ls
>>> 
>>> You can try to restore the image if there is one, then blocklist the client to
>>> release the watcher, then delete the image again.
>>> 
>>> I have to do that from time to time on a customer’s openstack cluster.
>>> 
>>> Zitat von Devender Singh <devender@xxxxxxxxxx>:
>>> 
>>>> Hello
>>>> 
>>>> Seeking some help if I can clean the client mounting my volume?
>>>> 
>>>> rbd status pool/image
>>>> 
>>>> Watchers:
>>>> 	watcher=10.160.0.245:0/2076588905 client.12541259 cookie=140446370329088
>>>> 
>>>> Issue: pod is failing in init- state.
>>>> Events:
>>>> Type     Reason       Age                  From     Message
>>>> ----     ------       ----                 ----     -------
>>>> Warning  FailedMount  96s (x508 over 24h)  kubelet  MountVolume.MountDevice
>>>> failed for volume "pvc-3a2048f1" : rpc error: code = Internal desc = rbd image
>>>> k8s-rgnl-disks/csi-vol-945c6a66-9129 is still being used
>>>> 
>>>> It shows above client, but there is no such volume…
>>>> 
>>>> Another similar issue… on dashboard…
>>>> 
>>>> CephNodeDiskspaceWarning
>>>> Mountpoint /mnt/dst-volume on sea-prod-host01 will be full in less than 5 days
>>>> based on the 48 hour trailing fill rate.
>>>> 
>>>> Whereas nothing is mounted, I mapped one image yesterday using red map and then
>>>> unmapped and unmounted everything but it been more than 12hours now, still
>>>> showing the message..
>>>> 
>>>> 
>>>> CEPH version: 18.2.4
>>>> 
>>>> Regards
>>>> Dev
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> 
>>> 
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx