Re: RBD client hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> That doesn't appear to be an error -- that's just stating that it found a dead client that was holding the exclusice-lock, so it broke the dead client's lock on the image (by blacklisting the client).

As there is only 1 RBD client in this testing, does it mean the RBD client process keeps failing?
In a fresh boot RBD client, doing some basic operations also gives the warning:

---------------- cut here ----------------
# rbd -n client.acapp1 map 4copy/foo
/dev/rbd0
# mount /dev/rbd0 /4copy
# cd /4copy; ls


# tail /var/log/messages
Jan 28 14:23:39 acapp1 kernel: Key type ceph registered
Jan 28 14:23:39 acapp1 kernel: libceph: loaded (mon/osd proto 15/24)
Jan 28 14:23:39 acapp1 kernel: rbd: loaded (major 252)
Jan 28 14:23:39 acapp1 kernel: libceph: mon2 192.168.1.156:6789 session established
Jan 28 14:23:39 acapp1 kernel: libceph: client80624 fsid cc795498-5d16-4b84-9584-1788d0458be9
Jan 28 14:23:39 acapp1 kernel: rbd: rbd0: capacity 10737418240 features 0x5
Jan 28 14:23:44 acapp1 kernel: XFS (rbd0): Mounting V5 Filesystem
Jan 28 14:23:44 acapp1 kernel: rbd: rbd0: client80621 seems dead, breaking lock		<--
Jan 28 14:23:45 acapp1 kernel: XFS (rbd0): Starting recovery (logdev: internal)
Jan 28 14:23:45 acapp1 kernel: XFS (rbd0): Ending recovery (logdev: internal)

---------------- cut here ----------------

Is this normal?



Besides, repeated the testing:
* Map and mount the rbd device, read/write ok.
* Umount all rbd, then reboot without problem
* Reboot hangs if not umounting all rbd before reboot:

---------------- cut here ----------------
Jan 28 14:13:12 acapp1 kernel: rbd: rbd0: client80531 seems dead, breaking lock
Jan 28 14:13:13 acapp1 kernel: XFS (rbd0): Ending clean mount				<-- Reboot hangs here
Jan 28 14:14:06 acapp1 systemd: Stopping Session 1 of user root.			<-- pressing power reset 
Jan 28 14:14:06 acapp1 systemd: Stopped target Multi-User System.
---------------- cut here ----------------

Is it necessary to umount all RDB before rebooting  the client host?

Thanks a lot.
/st

-----Original Message-----
From: Jason Dillaman <jdillama@xxxxxxxxxx> 
Sent: Friday, January 25, 2019 10:04 PM
To: ST Wong (ITSC) <ST@xxxxxxxxxxxxxxxx>
Cc: dillaman@xxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx
Subject: Re:  RBD client hangs

That doesn't appear to be an error -- that's just stating that it found a dead client that was holding the exclusice-lock, so it broke the dead client's lock on the image (by blacklisting the client).

On Fri, Jan 25, 2019 at 5:09 AM ST Wong (ITSC) <ST@xxxxxxxxxxxxxxxx> wrote:
>
> Oops, while I can map and mount the filesystem, still found error as below, while rebooting the client machine freezes and have to power reset her.
>
> Jan 25 17:57:30 acapp1 kernel: XFS (rbd0): Mounting V5 Filesystem Jan 
> 25 17:57:30 acapp1 kernel: rbd: rbd0: client74700 seems dead, breaking 
> lock ß Jan 25 17:57:30 acapp1 kernel: XFS (rbd0): Starting recovery 
> (logdev: internal) Jan 25 17:57:30 acapp1 kernel: XFS (rbd0): Ending 
> recovery (logdev: internal) Jan 25 17:58:07 acapp1 kernel: rbd: rbd1: 
> capacity 10737418240 features 0x5 Jan 25 17:58:14 acapp1 kernel: XFS 
> (rbd1): Mounting V5 Filesystem Jan 25 17:58:14 acapp1 kernel: rbd: 
> rbd1: client74700 seems dead, breaking lock ß Jan 25 17:58:15 acapp1 
> kernel: XFS (rbd1): Starting recovery (logdev: internal) Jan 25 
> 17:58:15 acapp1 kernel: XFS (rbd1): Ending recovery (logdev: internal)
>
> Would you help ?   Thanks.
> /st
>
> -----Original Message-----
> From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> On Behalf Of ST 
> Wong (ITSC)
> Sent: Friday, January 25, 2019 5:58 PM
> To: dillaman@xxxxxxxxxx
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  RBD client hangs
>
> Hi,  It works.  Thanks a lot.
>
> /st
>
> -----Original Message-----
> From: Jason Dillaman <jdillama@xxxxxxxxxx>
> Sent: Tuesday, January 22, 2019 9:29 PM
> To: ST Wong (ITSC) <ST@xxxxxxxxxxxxxxxx>
> Cc: Ilya Dryomov <idryomov@xxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  RBD client hangs
>
> Your "mon" cap should be "profile rbd" instead of "allow r" [1].
>
> [1] 
> http://docs.ceph.com/docs/master/rbd/rados-rbd-cmds/#create-a-block-de
> vice-user
>
> On Mon, Jan 21, 2019 at 9:05 PM ST Wong (ITSC) <ST@xxxxxxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > > Is this an upgraded or a fresh cluster?
> > It's a fresh cluster.
> >
> > > Does client.acapp1 have the permission to blacklist other clients?  You can check with "ceph auth get client.acapp1".
> >
> > No,  it's our first Ceph cluster with basic setup for testing, without any blacklist implemented.
> >
> > --------------- cut here ----------- # ceph auth get client.acapp1 
> > exported keyring for client.acapp1 [client.acapp1]
> >         key = <key here>
> >         caps mds = "allow r"
> >         caps mgr = "allow r"
> >         caps mon = "allow r"
> >         caps osd = "allow rwx pool=2copy, allow rwx pool=4copy"
> > --------------- cut here -----------
> >
> > Thanks a lot.
> > /st
> >
> >
> >
> > -----Original Message-----
> > From: Ilya Dryomov <idryomov@xxxxxxxxx>
> > Sent: Monday, January 21, 2019 7:33 PM
> > To: ST Wong (ITSC) <ST@xxxxxxxxxxxxxxxx>
> > Cc: ceph-users@xxxxxxxxxxxxxx
> > Subject: Re:  RBD client hangs
> >
> > On Mon, Jan 21, 2019 at 11:43 AM ST Wong (ITSC) <ST@xxxxxxxxxxxxxxxx> wrote:
> > >
> > > Hi, we’re trying mimic on an VM farm.  It consists 4 OSD hosts (8 OSDs) and 3 MON.     We tried mounting as RBD and CephFS (fuse and kernel mount) on different clients without problem.
> >
> > Is this an upgraded or a fresh cluster?
> >
> > >
> > > Then one day we perform failover test and stopped one of the OSD.  Not sure if it’s related but after that testing, the RBD client freeze when trying to mount the rbd device.
> > >
> > >
> > >
> > > Steps to reproduce:
> > >
> > >
> > >
> > > # modprobe rbd
> > >
> > >
> > >
> > > (dmesg)
> > >
> > > [  309.997587] Key type dns_resolver registered
> > >
> > > [  310.043647] Key type ceph registered
> > >
> > > [  310.044325] libceph: loaded (mon/osd proto 15/24)
> > >
> > > [  310.054548] rbd: loaded
> > >
> > >
> > >
> > > # rbd -n client.acapp1 map 4copy/foo
> > >
> > > /dev/rbd0
> > >
> > >
> > >
> > > # rbd showmapped
> > >
> > > id pool  image snap device
> > >
> > > 0  4copy foo   -    /dev/rbd0
> > >
> > >
> > >
> > >
> > >
> > > Then hangs if I tried to mount or reboot the server after rbd map.   There are lot of error in dmesg, e.g.
> > >
> > >
> > >
> > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: blacklist of client74700
> > > failed: -13
> > >
> > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: failed to acquire lock:
> > > -13
> > >
> > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: no lock owners detected
> > >
> > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: client74700 seems dead, 
> > > breaking lock
> > >
> > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: blacklist of client74700
> > > failed: -13
> > >
> > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: failed to acquire lock:
> > > -13
> > >
> > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: no lock owners detected
> >
> > Does client.acapp1 have the permission to blacklist other clients?  You can check with "ceph auth get client.acapp1".  If not, follow step 6 of http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken.
> >
> > Thanks,
> >
> >                 Ilya
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux