Re: Not timing out watcher

"Serguei Bezverkhi (sbezverk)" <sbezverk@xxxxxxxxx> · Thu, 21 Dec 2017 14:04:58 +0000

Hi Ilya,

Here you go, no k8s services running this time:

sbezverk@kube-4:~$ sudo rbd map raw-volume --pool kubernetes --id admin -m 192.168.80.233  --key=AQCeHO1ZILPPDRAA7zw3d76bplkvTwzoosybvA==
/dev/rbd0
sbezverk@kube-4:~$ sudo rbd status raw-volume --pool kubernetes --id admin -m 192.168.80.233  --key=AQCeHO1ZILPPDRAA7zw3d76bplkvTwzoosybvA==
Watchers:
        watcher=192.168.80.235:0/3465920438 client.65327 cookie=1
sbezverk@kube-4:~$ sudo rbd info raw-volume --pool kubernetes --id admin -m 192.168.80.233  --key=AQCeHO1ZILPPDRAA7zw3d76bplkvTwzoosybvA==
rbd image 'raw-volume':
        size 10240 MB in 2560 objects
        order 22 (4096 kB objects)
        block_name_prefix: rb.0.fafa.625558ec
        format: 1
sbezverk@kube-4:~$ sudo reboot

sbezverk@kube-4:~$ sudo rbd status raw-volume --pool kubernetes --id admin -m 192.168.80.233  --key=AQCeHO1ZILPPDRAA7zw3d76bplkvTwzoosybvA==
Watchers: none

It seems when the image was mapped manually, this issue is not reproducible. 

K8s does not just map the image, it also creates loopback device which is linked to /dev/rbd0. Maybe this somehow reminds rbd client to re-activate a watcher on reboot. I will try to mimic exact steps k8s follows manually to see what exactly forces an active watcher after reboot.

Thank you
Serguei

On 2017-12-21, 5:49 AM, "Ilya Dryomov" <idryomov@xxxxxxxxx> wrote:

    On Wed, Dec 20, 2017 at 6:20 PM, Serguei Bezverkhi (sbezverk)
    <sbezverk@xxxxxxxxx> wrote:
    > It took 30 minutes for the Watcher to time out after ungraceful restart. Is there a way limit it to something a bit more reasonable? Like 1-3 minutes?
    >
    > On 2017-12-20, 12:01 PM, "Serguei Bezverkhi (sbezverk)" <sbezverk@xxxxxxxxx> wrote:
    >
    >     Ok, here is what I found out. If I gracefully kill a pod then watcher gets properly cleared, but if it is done ungracefully, without “rbd unmap” then even after a node reboot Watcher stays up for a long time,  it has been more than 20 minutes and it is still active (no any kubernetes services are running).

    Hi Serguei,

    Can you try taking k8s out of the equation -- set up a fresh VM with
    the same kernel, do "rbd map" in it and kill it?

    Thanks,

                    Ilya

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com