Re: Lot of blocked operations

Olivier Bonvalet <ceph.list@xxxxxxxxx> · Fri, 18 Sep 2015 10:53:52 +0200

mmm good point.

I don't see CPU or IO problem on mons, but in logs, I have this :

2015-09-18 01:55:16.921027 7fb951175700  0 log [INF] : pgmap v86359128:
6632 pgs: 77 inactive, 1 remapped, 10 active+remapped+wait_backfill, 25
peering, 5 active+remapped, 6 active+remapped+backfilling, 6499
active+clean, 9 remapped+peering; 18974 GB data, 69004 GB used, 58578
GB / 124 TB avail; 915 kB/s rd, 26383 kB/s wr, 1671 op/s; 8417/15680513
objects degraded (0.054%); 1062 MB/s, 274 objects/s recovering

So... it can be a peering problem. Didn't see that, thanks.

Le vendredi 18 septembre 2015 à 09:52 +0200, Jan Schermer a écrit :
> Could this be caused by monitors? In my case lagging monitors can
> also cause slow requests (because of slow peering). Not sure if
> that's expected or not, but it of course doesn't show on the OSDs as
> any kind of bottleneck when you try to investigate...
> 
> Jan
> 
> > On 18 Sep 2015, at 09:37, Olivier Bonvalet <ceph.list@xxxxxxxxx>
> > wrote:
> > 
> > Hi,
> > 
> > sorry for missing informations. I was to avoid putting too much
> > inappropriate infos ;)
> > 
> > 
> > 
> > Le vendredi 18 septembre 2015 à 12:30 +0900, Christian Balzer a
> > écrit :
> > > Hello,
> > > 
> > > On Fri, 18 Sep 2015 02:43:49 +0200 Olivier Bonvalet wrote:
> > > 
> > > The items below help, but be a s specific as possible, from OS,
> > > kernel
> > > version to Ceph version, "ceph -s", any other specific details
> > > (pool
> > > type,
> > > replica size).
> > > 
> > 
> > So, all nodes use Debian Wheezy, running on a vanilla 3.14.x
> > kernel,
> > and Ceph 0.80.10.
> > I don't have anymore ceph status right now. But I have
> > data to move tonight again, so I'll track that.
> > 
> > The affected pool is a standard one (no erasure coding), with only
> > 2 replica (size=2).
> > 
> > 
> > 
> > 
> > > > Some additionnal informations :
> > > > - I have 4 SSD per node.
> > > Type, if nothing else for anecdotal reasons.
> > 
> > I have 7 storage nodes here :
> > - 3 nodes which have each 12 OSD of 300GB
> > SSD
> > - 4 nodes which have each  4 OSD of 800GB SSD
> > 
> > And I'm trying to replace 12x300GB nodes by 4x800GB nodes.
> > 
> > 
> > 
> > > > - the CPU usage is near 0
> > > > - IO wait is near 0 too
> > > Including the trouble OSD(s)?
> > 
> > Yes
> > 
> > 
> > > Measured how, iostat or atop?
> > 
> > iostat, htop, and confirmed with Zabbix supervisor.
> > 
> > 
> > 
> > 
> > > > - bandwith usage is also near 0
> > > > 
> > > Yeah, all of the above are not surprising if everything is stuck
> > > waiting
> > > on some ops to finish. 
> > > 
> > > How many nodes are we talking about?
> > 
> > 
> > 7 nodes, 52 OSDs.
> > 
> > 
> > 
> > > > The whole cluster seems waiting for something... but I don't
> > > > see
> > > > what.
> > > > 
> > > Is it just one specific OSD (or a set of them) or is that all
> > > over
> > > the
> > > place?
> > 
> > A set of them. When I increase the weight of all 4 OSDs of a node,
> > I
> > frequently have blocked IO from 1 OSD of this node.
> > 
> > 
> > 
> > > Does restarting the OSD fix things?
> > 
> > Yes. For several minutes.
> > 
> > 
> > > Christian
> > > > 
> > > > Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a
> > > > écrit :
> > > > > Hi,
> > > > > 
> > > > > I have a cluster with lot of blocked operations each time I
> > > > > try
> > > > > to
> > > > > move
> > > > > data (by reweighting a little an OSD).
> > > > > 
> > > > > It's a full SSD cluster, with 10GbE network.
> > > > > 
> > > > > In logs, when I have blocked OSD, on the main OSD I can see
> > > > > that
> > > > > :
> > > > > 2015-09-18 01:55:16.981396 7f89e8cb8700  0 log [WRN] : 2 slow
> > > > > requests, 1 included below; oldest blocked for > 33.976680
> > > > > secs
> > > > > 2015-09-18 01:55:16.981402 7f89e8cb8700  0 log [WRN] : slow
> > > > > request
> > > > > 30.125556 seconds old, received at 2015-09-18
> > > > > 01:54:46.855821:
> > > > > osd_op(client.29760717.1:18680817544
> > > > > rb.0.1c16005.238e1f29.00000000027f [write 180224~16384]
> > > > > 6.c11916a4
> > > > > snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) v4
> > > > > currently
> > > > > reached pg
> > > > > 2015-09-18 01:55:46.986319 7f89e8cb8700  0 log [WRN] : 2 slow
> > > > > requests, 1 included below; oldest blocked for > 63.981596
> > > > > secs
> > > > > 2015-09-18 01:55:46.986324 7f89e8cb8700  0 log [WRN] : slow
> > > > > request
> > > > > 60.130472 seconds old, received at 2015-09-18
> > > > > 01:54:46.855821:
> > > > > osd_op(client.29760717.1:18680817544
> > > > > rb.0.1c16005.238e1f29.00000000027f [write 180224~16384]
> > > > > 6.c11916a4
> > > > > snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) v4
> > > > > currently
> > > > > reached pg
> > > > > 
> > > > > How should I read that ? What this OSD is waiting for ?
> > > > > 
> > > > > Thanks for any help,
> > > > > 
> > > > > Olivier
> > > > > _______________________________________________
> > > > > ceph-users mailing list
> > > > > ceph-users@xxxxxxxxxxxxxx
> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > 
> > > > _______________________________________________
> > > > ceph-users mailing list
> > > > ceph-users@xxxxxxxxxxxxxx
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com