Re: Blocked ops, OSD consuming memory, hammer

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 25 May 2016 07:42:59 -0700

On Tue, May 24, 2016 at 11:19 PM, Heath Albritton <halbritt@xxxxxxxx> wrote:
> Not going to attempt threading and apologies for the two messages on
> the same topic.  Christian is right, though.  3 nodes per tier, 8 SSDs
> per node in the cache tier, 12 spinning disks in the cold tier.  10GE
> client network with a separate 10GE back side network.  Each node in
> the cold tier has two Intel P3700 SSDs as a journal.  This setup has
> yielded excellent performance over the past year.
>
> The memory exhaustion comes purely from one errant OSD process.  All
> the remaining processes look fairly normal in terms of memory
> consumption.
>
> These nodes aren't particularly busy.  A random sampling shows a few
> hundred kilobytes of data being written and very few reads.
>
> Thus far, I've done quite a bit of juggling of OSDs.  Setting the
> cluster to noup.  Restarting the failed ones, letting them get to the
> current map and then clearing the noup flag and letting them rejoin.
> Eventually, they'll fail again and then a fairly intense recovery
> happens.
>
> here's ceph -s:
>
> https://dl.dropboxusercontent.com/u/90634073/ceph/ceph_dash_ess.txt
>
> Cluster has been in this state for a while.  There are 3 PGs that seem
> to be problematic:
>
> [root@t2-node01 ~]# pg dump | grep recovering
> -bash: pg: command not found
> [root@t2-node01 ~]# ceph pg dump | grep recovering
> dumped all in format plain
> 9.2f1 1353 1075 4578 1353 1075 9114357760 2611 2611
> active+recovering+degraded+remapped 2016-05-24 21:49:26.766924
> 8577'2611 8642:84 [15,31] 15 [15,31,0] 15 5123'2483 2016-05-23
> 23:52:54.360710 5123'2483 2016-05-23 23:52:54.360710
> 12.258 878 875 2628 0 0 4414509568 1534 1534
> active+recovering+undersized+degraded 2016-05-24 21:47:48.085476
> 4261'1534 8587:17712 [4,20] 4 [4,20] 4 4261'1534 2016-05-23
> 07:22:44.819208 4261'1534 2016-05-23 07:22:44.819208
> 11.58 376 0 1 2223 0 1593129984 4909 4909
> active+recovering+degraded+remapped 2016-05-24 05:49:07.531198
> 8642'409248 8642:406269 [56,49,41] 56 [40,48,62] 40 4261'406995
> 2016-05-22 21:40:40.205540 4261'406450 2016-05-21 21:37:35.497307
>
> pg 9.2f1 query:
> https://dl.dropboxusercontent.com/u/90634073/ceph/pg_9.21f.txt
>
> When I query 12.258 it just hangs
>
> pg 11.58 query:
> https://dl.dropboxusercontent.com/u/90634073/ceph/pg_11.58.txt

Well, you've clearly had some things go very wrong. That "undersized"
means that the pg doesn't have enough copies to be allowed to process
writes, and I'm a little confused that it's also marked active but I
don't quite remember the PG state diagrams involved. You should
consider it down; it should be trying to recover itself though. I'm
not quite certain if the query is considered an operation it's not
allowed to service (which the RADOS team will need to fix, if it's not
done already in later releases) or if the query hanging is indicative
of yet another problem.

The memory expansion is probably operations incoming on some of those
missing objects, or on the PG which can't take writes (but is trying
to recover itself to a state where it *can*). In general it shouldn't
be enough to exhaust the memory in the system, but you might have
mis-tuned things so that clients are allowed to use up a lot more
memory than is appropriate, or there might be a bug in v0.94.5.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com