Re: long blocking with writes on rbds

Nick Fisk <nick@xxxxxxxxxx> · Thu, 23 Apr 2015 15:34:44 +0100

Hi Jeff,

I believe these are normal, they are just the connections IDLE timing out to
the OSD's because no traffic has flowed recently. They are probably a
symptom rather than a cause.

Nick

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Jeff Epstein
> Sent: 23 April 2015 15:19
> To: Lionel Bouton; Christian Balzer
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  long blocking with writes on rbds
> 
> The appearance of these socket closed messages seems to coincide with the
> slowdown symptoms. What is the cause?
> 
> 2015-04-23T14:08:47.111838+00:00 i-65062482 kernel: [ 4229.485489]
libceph:
> osd1 192.168.160.4:6800 socket closed (con state OPEN)
> 
> 2015-04-23T14:09:06.961823+00:00 i-65062482 kernel: [ 4249.332547]
libceph:
> osd2 192.168.96.4:6800 socket closed (con state OPEN)
> 
> 2015-04-23T14:09:09.701819+00:00 i-65062482 kernel: [ 4252.070594]
libceph:
> osd4 192.168.64.4:6800 socket closed (con state OPEN)
> 
> 2015-04-23T14:09:10.381817+00:00 i-65062482 kernel: [ 4252.755400]
libceph:
> osd5 192.168.128.4:6800 socket closed (con state OPEN)
> 
> 2015-04-23T14:09:14.831817+00:00 i-65062482 kernel: [ 4257.200257]
libceph:
> osd5 192.168.128.4:6800 socket closed (con state OPEN)
> 
> 2015-04-23T14:13:57.061877+00:00 i-65062482 kernel: [ 4539.431624]
libceph:
> osd4 192.168.64.4:6800 socket closed (con state OPEN)
> 
> 2015-04-23T14:13:57.541842+00:00 i-65062482 kernel: [ 4539.913284]
libceph:
> osd5 192.168.128.4:6800 socket closed (con state OPEN)
> 
> 2015-04-23T14:13:59.801822+00:00 i-65062482 kernel: [ 4542.177187]
libceph:
> osd3 192.168.0.4:6800 socket closed (con state OPEN)
> 
> 2015-04-23T14:14:11.361819+00:00 i-65062482 kernel: [ 4553.733566]
libceph:
> osd4 192.168.64.4:6800 socket closed (con state OPEN)
> 
> 2015-04-23T14:14:47.871829+00:00 i-65062482 kernel: [ 4590.242136]
libceph:
> osd5 192.168.128.4:6800 socket closed (con state OPEN)
> 
> 2015-04-23T14:14:47.991826+00:00 i-65062482 kernel: [ 4590.364078]
libceph:
> osd2 192.168.96.4:6800 socket closed (con state OPEN)
> 
> 2015-04-23T14:15:00.081817+00:00 i-65062482 kernel: [ 4602.452980]
libceph:
> osd5 192.168.128.4:6800 socket closed (con state OPEN)
> 
> 2015-04-23T14:16:21.301820+00:00 i-65062482 kernel: [ 4683.671614]
libceph:
> osd5 192.168.128.4:6800 socket closed (con state OPEN)
> 
> 
> 
> Jeff
> 
> On 04/23/2015 12:26 AM, Jeff Epstein wrote:
> >
> >>>> Do you have some idea how I can diagnose this problem?
> >>>
> >>> I'll look at ceph -s output while you get these stuck process to see
> >>> if there's any unusual activity (scrub/deep
> >>> scrub/recovery/bacfills/...). Is it correlated in any way with rbd
> >>> removal (ie: write blocking don't appear unless you removed at least
> >>> one rbd for say one hour before the write performance problems).
> >>
> >> I'm not familiar with Amazon VMs. If you map the rbds using the
> >> kernel driver to local block devices do you have control over the
> >> kernel you run (I've seen reports of various problems with older
> >> kernels and you probably want the latest possible) ?
> >
> > ceph status shows nothing unusual. However, on the problematic node,
> > we typically see entries in ps like this:
> >
> >  1468 12329 root     D     0.0 mkfs.ext4       wait_on_page_bit
> >  1468 12332 root     D     0.0 mkfs.ext4       wait_on_buffer
> >
> > Notice the "D" blocking state. Here, mkfs is stopped on some wait
> > functions for long periods of time. (Also, we are formatting the RBDs
> > as ext4 even though the OSDs are xfs; I assume this shouldn't be a
> > problem?)
> >
> > We're on kernel 3.18.4pl2, which is pretty recent. Still, an outdated
> > kernel driver isn't out of the question; if anyone has any concrete
> > information, I'd be grateful.
> >
> > Jeff
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com