Re: Lot of blocked operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Fri, 18 Sep 2015 02:43:49 +0200 Olivier Bonvalet wrote:

The items below help, but be a s specific as possible, from OS, kernel
version to Ceph version, "ceph -s", any other specific details (pool type,
replica size).

> Some additionnal informations :
> - I have 4 SSD per node.
Type, if nothing else for anecdotal reasons.
 
> - the CPU usage is near 0
> - IO wait is near 0 too
Including the trouble OSD(s)?
Measured how, iostat or atop?

> - bandwith usage is also near 0
>
Yeah, all of the above are not surprising if everything is stuck waiting
on some ops to finish. 

How many nodes are we talking about?

> The whole cluster seems waiting for something... but I don't see what.
>
Is it just one specific OSD (or a set of them) or is that all over the
place?

Does restarting the OSD fix things?
 
Christian
> 
> Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a écrit :
> > Hi,
> > 
> > I have a cluster with lot of blocked operations each time I try to
> > move
> > data (by reweighting a little an OSD).
> > 
> > It's a full SSD cluster, with 10GbE network.
> > 
> > In logs, when I have blocked OSD, on the main OSD I can see that :
> > 2015-09-18 01:55:16.981396 7f89e8cb8700  0 log [WRN] : 2 slow
> > requests, 1 included below; oldest blocked for > 33.976680 secs
> > 2015-09-18 01:55:16.981402 7f89e8cb8700  0 log [WRN] : slow request
> > 30.125556 seconds old, received at 2015-09-18 01:54:46.855821:
> > osd_op(client.29760717.1:18680817544
> > rb.0.1c16005.238e1f29.00000000027f [write 180224~16384] 6.c11916a4
> > snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) v4 currently
> > reached pg
> > 2015-09-18 01:55:46.986319 7f89e8cb8700  0 log [WRN] : 2 slow
> > requests, 1 included below; oldest blocked for > 63.981596 secs
> > 2015-09-18 01:55:46.986324 7f89e8cb8700  0 log [WRN] : slow request
> > 60.130472 seconds old, received at 2015-09-18 01:54:46.855821:
> > osd_op(client.29760717.1:18680817544
> > rb.0.1c16005.238e1f29.00000000027f [write 180224~16384] 6.c11916a4
> > snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) v4 currently
> > reached pg
> > 
> > How should I read that ? What this OSD is waiting for ?
> > 
> > Thanks for any help,
> > 
> > Olivier
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux