Hi, > Is this an upgraded or a fresh cluster? It's a fresh cluster. > Does client.acapp1 have the permission to blacklist other clients? You can check with "ceph auth get client.acapp1". No, it's our first Ceph cluster with basic setup for testing, without any blacklist implemented. --------------- cut here ----------- # ceph auth get client.acapp1 exported keyring for client.acapp1 [client.acapp1] key = <key here> caps mds = "allow r" caps mgr = "allow r" caps mon = "allow r" caps osd = "allow rwx pool=2copy, allow rwx pool=4copy" --------------- cut here ----------- Thanks a lot. /st -----Original Message----- From: Ilya Dryomov <idryomov@xxxxxxxxx> Sent: Monday, January 21, 2019 7:33 PM To: ST Wong (ITSC) <ST@xxxxxxxxxxxxxxxx> Cc: ceph-users@xxxxxxxxxxxxxx Subject: Re: RBD client hangs On Mon, Jan 21, 2019 at 11:43 AM ST Wong (ITSC) <ST@xxxxxxxxxxxxxxxx> wrote: > > Hi, we’re trying mimic on an VM farm. It consists 4 OSD hosts (8 OSDs) and 3 MON. We tried mounting as RBD and CephFS (fuse and kernel mount) on different clients without problem. Is this an upgraded or a fresh cluster? > > Then one day we perform failover test and stopped one of the OSD. Not sure if it’s related but after that testing, the RBD client freeze when trying to mount the rbd device. > > > > Steps to reproduce: > > > > # modprobe rbd > > > > (dmesg) > > [ 309.997587] Key type dns_resolver registered > > [ 310.043647] Key type ceph registered > > [ 310.044325] libceph: loaded (mon/osd proto 15/24) > > [ 310.054548] rbd: loaded > > > > # rbd -n client.acapp1 map 4copy/foo > > /dev/rbd0 > > > > # rbd showmapped > > id pool image snap device > > 0 4copy foo - /dev/rbd0 > > > > > > Then hangs if I tried to mount or reboot the server after rbd map. There are lot of error in dmesg, e.g. > > > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: blacklist of client74700 > failed: -13 > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: failed to acquire lock: -13 > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: no lock owners detected > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: client74700 seems dead, > breaking lock > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: blacklist of client74700 > failed: -13 > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: failed to acquire lock: -13 > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: no lock owners detected Does client.acapp1 have the permission to blacklist other clients? You can check with "ceph auth get client.acapp1". If not, follow step 6 of http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken. Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com