Re: Watcher Issue

Reid Guyett <reid.guyett@xxxxxxxxx> · Thu, 23 Jan 2025 09:02:54 -0500

Hi,

I've had a similar issue but outside of ceph-csi. Running a CRUD test to
(create, map, write, read, unmap, and delete) an RBD in a short amount of
time can result in it having a stuck watcher. I assume it is from mapping
and unmapping very quickly (under 30 sec).
What I have found is if you restart the primary osd for the header object,
the watcher will go away assuming nothing is really watching it.

> rbd info -p pool-name rbd-name
> # get the id from the output. ex: 1234
> ceph osd map pool-name rbd_header.1234
> # get the primary under acting pNNN ex 43
> ceph osd down 43
>

This is the tracker <https://tracker.ceph.com/issues/58120> I'm watching
and the backport says it should be fixed in 18.2.5

Hope this helps,
Reid

On Wed, Jan 22, 2025 at 4:14 PM Devender Singh <devender@xxxxxxxxxx> wrote:

> Hello Frederic
>
> Thanks for your email.
> We already verified those and tried killing them and upgrading the k8s and
> cis-plugin to but nothing helps.
> Below is the output.. did not report any volume..
>
> # for pod in $(kubectl -n $namespace get pods | grep -E
> 'rbdplugin|nodeplugin' | grep -v provisioner | awk '{print $1}'); do echo
> $pod; kubectl exec -it -n $namespace $pod -c csi-rbdplugin -- rbd device
> list | grep $image ; done
> ceph-csi-rbd-nodeplugin-48vs2
> ceph-csi-rbd-nodeplugin-6zmjj
> ceph-csi-rbd-nodeplugin-7g6r5
> ceph-csi-rbd-nodeplugin-bp84x
> ceph-csi-rbd-nodeplugin-bt6hh
> ceph-csi-rbd-nodeplugin-d4tww
> ceph-csi-rbd-nodeplugin-rtb68
> ceph-csi-rbd-nodeplugin-t87db
>
> But still error ;
> # date;kubectl -n elastic describe pod/es-es-default-3 |grep -i warning
> Wed 22 Jan 2025 01:12:09 PM PST
>   Warning  FailedMount  2s (x13 over 21m)  kubelet
> MountVolume.MountDevice failed for volume "pvc-3a2048f1" : rpc error: code
> = Internal desc = rbd image k8s-rgnl-disks/csi-vol-945c6a66 is still being
> used
>
>
> Regards
> Dev
>
> > On Jan 21, 2025, at 11:50 PM, Frédéric Nass <
> frederic.nass@xxxxxxxxxxxxxxxx> wrote:
> >
> > Hi Dev,
> >
> > Can you run the below command to check if this image is still considered
> as mapped by any ceph-csi nodeplugins?
> >
> > $ namespace=ceph-csi-rbd
> > $ image=csi-vol-945c6a66-9129
> > $ for pod in $(kubectl -n $namespace get pods | grep -E
> 'rbdplugin|nodeplugin' | grep -v provisioner | awk '{print $1}'); do echo
> $pod; kubectl exec -it -n $namespace $pod -c csi-rbdplugin -- rbd device
> list | grep $image ; done
> >
> > If it pops out in the output, get into the csi-rbdplugin container of
> the nodeplugin pod that listed the image and unmount/unmap it:
> >
> > $ kubectl -n $namespace exec -ti ceph-csi-rbd-nodeplugin-xxxxx -c
> csi-rbdplugin -- sh           <---- please adjust nodepluding pod name here
> > sh-4.4#
> > sh-4.4# rbd device list
> > id  pool           namespace  image                  snap  device
> > 0   k8s-rgnl-disks            csi-vol-945c6a66-9129  -     /dev/rbd0
> > sh-4.4# umount /dev/rbd/k8s-rgnl-disks/csi-vol-945c6a66-9129
> > sh-4.4# rbd unmap /dev/rbd/k8s-rgnl-disks/csi-vol-945c6a66-9129
> > sh-4.4# rbd device list
> > sh-4.4#
> >
> > Hope there's no typo.
> >
> > Regards,
> > Frédéric.
> >
> > ----- Le 21 Jan 25, à 23:33, Devender Singh devender@xxxxxxxxxx <mailto:
> devender@xxxxxxxxxx> a écrit :
> >
> >> Hello Eugen
> >>
> >> Thanks for your reply.
> >> I have the image available and it’s not under trash.
> >>
> >> When scaling a pod to different node using statefulset, pod gives mount
> issue.
> >>
> >> I was looking for a command if we can kill the client.id <
> http://client.id/> <
> https://www.google.com/url?q=http://client.id/&source=gmail-imap&ust=1738137024000000&usg=AOvVaw10QRl9S7YS6pPaI6JKmdyy
> >
> >> from ceph. CEPH must have a command to kill its clients etc…
> >> Don’t understand why pod complaining about same volume name about a k8s
> host
> >> using it. Whereas its nowhere.. Not sure what to do in this situation..
> >> We tried upgrading csi, k8s cluster. Renamed image and blocklisted the
> host. And
> >> renamed back image to its original image but still red status showing
> same
> >> client host.
> >>
> >>
> >> Regards
> >> Dev
> >>
> >>> On Jan 21, 2025, at 12:16 PM, Eugen Block <eblock@xxxxxx> wrote:
> >>>
> >>> Hi,
> >>>
> >>> have you checked if the image is in the trash?
> >>>
> >>> rbd -p {pool} trash ls
> >>>
> >>> You can try to restore the image if there is one, then blocklist the
> client to
> >>> release the watcher, then delete the image again.
> >>>
> >>> I have to do that from time to time on a customer’s openstack cluster.
> >>>
> >>> Zitat von Devender Singh <devender@xxxxxxxxxx>:
> >>>
> >>>> Hello
> >>>>
> >>>> Seeking some help if I can clean the client mounting my volume?
> >>>>
> >>>> rbd status pool/image
> >>>>
> >>>> Watchers:
> >>>>    watcher=10.160.0.245:0/2076588905 client.12541259
> cookie=140446370329088
> >>>>
> >>>> Issue: pod is failing in init- state.
> >>>> Events:
> >>>> Type     Reason       Age                  From     Message
> >>>> ----     ------       ----                 ----     -------
> >>>> Warning  FailedMount  96s (x508 over 24h)  kubelet
> MountVolume.MountDevice
> >>>> failed for volume "pvc-3a2048f1" : rpc error: code = Internal desc =
> rbd image
> >>>> k8s-rgnl-disks/csi-vol-945c6a66-9129 is still being used
> >>>>
> >>>> It shows above client, but there is no such volume…
> >>>>
> >>>> Another similar issue… on dashboard…
> >>>>
> >>>> CephNodeDiskspaceWarning
> >>>> Mountpoint /mnt/dst-volume on sea-prod-host01 will be full in less
> than 5 days
> >>>> based on the 48 hour trailing fill rate.
> >>>>
> >>>> Whereas nothing is mounted, I mapped one image yesterday using red
> map and then
> >>>> unmapped and unmounted everything but it been more than 12hours now,
> still
> >>>> showing the message..
> >>>>
> >>>>
> >>>> CEPH version: 18.2.4
> >>>>
> >>>> Regards
> >>>> Dev
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:
> ceph-users@xxxxxxx>
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:
> ceph-users-leave@xxxxxxx>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx