On Fri, 18 Sep 2015 11:07:49 +0200 Olivier Bonvalet wrote: > Le vendredi 18 septembre 2015 à 10:59 +0200, Jan Schermer a écrit : > > In that case it can either be slow monitors (slow network, slow > > disks(!!!) or a CPU or memory problem). > > But it still can also be on the OSD side in the form of either CPU > > usage or memory pressure - in my case there were lots of memory used > > for pagecache (so for all intents and purposes considered "free") but > > when peering the OSD had trouble allocating any memory from it and it > > caused lots of slow ops and peering hanging in there for a while. > > This also doesn't show as high CPU usage, only kswapd spins up a bit > > (don't be fooled by its name, it has nothing to do with swap in this > > case). > > My nodes have 256GB of RAM (for 12x300GB ones) or 128GB of RAM (for > 4x800GB ones), so I will try track this too. Thanks ! > I haven't seen this (known problem) with 64GB or 128GB nodes, probably because I set /proc/sys/vm/min_free_kbytes to 512MB or 1GB respectively. Christian. > > > echo 1 >/proc/sys/vm/drop_caches before I touch anything has become a > > routine now and that problem is gone. > > > > Jan > > > > > On 18 Sep 2015, at 10:53, Olivier Bonvalet <ceph.list@xxxxxxxxx> > > > wrote: > > > > > > mmm good point. > > > > > > I don't see CPU or IO problem on mons, but in logs, I have this : > > > > > > 2015-09-18 01:55:16.921027 7fb951175700 0 log [INF] : pgmap > > > v86359128: > > > 6632 pgs: 77 inactive, 1 remapped, 10 > > > active+remapped+wait_backfill, 25 > > > peering, 5 active+remapped, 6 active+remapped+backfilling, 6499 > > > active+clean, 9 remapped+peering; 18974 GB data, 69004 GB used, > > > 58578 > > > GB / 124 TB avail; 915 kB/s rd, 26383 kB/s wr, 1671 op/s; > > > 8417/15680513 > > > objects degraded (0.054%); 1062 MB/s, 274 objects/s recovering > > > > > > > > > So... it can be a peering problem. Didn't see that, thanks. > > > > > > > > > > > > Le vendredi 18 septembre 2015 à 09:52 +0200, Jan Schermer a écrit : > > > > Could this be caused by monitors? In my case lagging monitors can > > > > also cause slow requests (because of slow peering). Not sure if > > > > that's expected or not, but it of course doesn't show on the OSDs > > > > as > > > > any kind of bottleneck when you try to investigate... > > > > > > > > Jan > > > > > > > > > On 18 Sep 2015, at 09:37, Olivier Bonvalet <ceph.list@xxxxxxxxx > > > > > > > > > > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > sorry for missing informations. I was to avoid putting too much > > > > > inappropriate infos ;) > > > > > > > > > > > > > > > > > > > > Le vendredi 18 septembre 2015 à 12:30 +0900, Christian Balzer a > > > > > écrit : > > > > > > Hello, > > > > > > > > > > > > On Fri, 18 Sep 2015 02:43:49 +0200 Olivier Bonvalet wrote: > > > > > > > > > > > > The items below help, but be a s specific as possible, from > > > > > > OS, > > > > > > kernel > > > > > > version to Ceph version, "ceph -s", any other specific > > > > > > details > > > > > > (pool > > > > > > type, > > > > > > replica size). > > > > > > > > > > > > > > > > So, all nodes use Debian Wheezy, running on a vanilla 3.14.x > > > > > kernel, > > > > > and Ceph 0.80.10. > > > > > I don't have anymore ceph status right now. But I have > > > > > data to move tonight again, so I'll track that. > > > > > > > > > > The affected pool is a standard one (no erasure coding), with > > > > > only > > > > > 2 replica (size=2). > > > > > > > > > > > > > > > > > > > > > > > > > > > Some additionnal informations : > > > > > > > - I have 4 SSD per node. > > > > > > Type, if nothing else for anecdotal reasons. > > > > > > > > > > I have 7 storage nodes here : > > > > > - 3 nodes which have each 12 OSD of 300GB > > > > > SSD > > > > > - 4 nodes which have each 4 OSD of 800GB SSD > > > > > > > > > > And I'm trying to replace 12x300GB nodes by 4x800GB nodes. > > > > > > > > > > > > > > > > > > > > > > - the CPU usage is near 0 > > > > > > > - IO wait is near 0 too > > > > > > Including the trouble OSD(s)? > > > > > > > > > > Yes > > > > > > > > > > > > > > > > Measured how, iostat or atop? > > > > > > > > > > iostat, htop, and confirmed with Zabbix supervisor. > > > > > > > > > > > > > > > > > > > > > > > > > > > - bandwith usage is also near 0 > > > > > > > > > > > > > Yeah, all of the above are not surprising if everything is > > > > > > stuck > > > > > > waiting > > > > > > on some ops to finish. > > > > > > > > > > > > How many nodes are we talking about? > > > > > > > > > > > > > > > 7 nodes, 52 OSDs. > > > > > > > > > > > > > > > > > > > > > > The whole cluster seems waiting for something... but I > > > > > > > don't > > > > > > > see > > > > > > > what. > > > > > > > > > > > > > Is it just one specific OSD (or a set of them) or is that all > > > > > > over > > > > > > the > > > > > > place? > > > > > > > > > > A set of them. When I increase the weight of all 4 OSDs of a > > > > > node, > > > > > I > > > > > frequently have blocked IO from 1 OSD of this node. > > > > > > > > > > > > > > > > > > > > > Does restarting the OSD fix things? > > > > > > > > > > Yes. For several minutes. > > > > > > > > > > > > > > > > Christian > > > > > > > > > > > > > > Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier > > > > > > > Bonvalet a > > > > > > > écrit : > > > > > > > > Hi, > > > > > > > > > > > > > > > > I have a cluster with lot of blocked operations each time > > > > > > > > I > > > > > > > > try > > > > > > > > to > > > > > > > > move > > > > > > > > data (by reweighting a little an OSD). > > > > > > > > > > > > > > > > It's a full SSD cluster, with 10GbE network. > > > > > > > > > > > > > > > > In logs, when I have blocked OSD, on the main OSD I can > > > > > > > > see > > > > > > > > that > > > > > > > > : > > > > > > > > 2015-09-18 01:55:16.981396 7f89e8cb8700 0 log [WRN] : 2 > > > > > > > > slow > > > > > > > > requests, 1 included below; oldest blocked for > > > > > > > > > 33.976680 > > > > > > > > secs > > > > > > > > 2015-09-18 01:55:16.981402 7f89e8cb8700 0 log [WRN] : > > > > > > > > slow > > > > > > > > request > > > > > > > > 30.125556 seconds old, received at 2015-09-18 > > > > > > > > 01:54:46.855821: > > > > > > > > osd_op(client.29760717.1:18680817544 > > > > > > > > rb.0.1c16005.238e1f29.00000000027f [write 180224~16384] > > > > > > > > 6.c11916a4 > > > > > > > > snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) v4 > > > > > > > > currently > > > > > > > > reached pg > > > > > > > > 2015-09-18 01:55:46.986319 7f89e8cb8700 0 log [WRN] : 2 > > > > > > > > slow > > > > > > > > requests, 1 included below; oldest blocked for > > > > > > > > > 63.981596 > > > > > > > > secs > > > > > > > > 2015-09-18 01:55:46.986324 7f89e8cb8700 0 log [WRN] : > > > > > > > > slow > > > > > > > > request > > > > > > > > 60.130472 seconds old, received at 2015-09-18 > > > > > > > > 01:54:46.855821: > > > > > > > > osd_op(client.29760717.1:18680817544 > > > > > > > > rb.0.1c16005.238e1f29.00000000027f [write 180224~16384] > > > > > > > > 6.c11916a4 > > > > > > > > snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) v4 > > > > > > > > currently > > > > > > > > reached pg > > > > > > > > > > > > > > > > How should I read that ? What this OSD is waiting for ? > > > > > > > > > > > > > > > > Thanks for any help, > > > > > > > > > > > > > > > > Olivier > > > > > > > > _______________________________________________ > > > > > > > > ceph-users mailing list > > > > > > > > ceph-users@xxxxxxxxxxxxxx > > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > ceph-users mailing list > > > > > > > ceph-users@xxxxxxxxxxxxxx > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > ceph-users mailing list > > > > > ceph-users@xxxxxxxxxxxxxx > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com