Re: PGs in peered state?

Gregory Farnum <gfarnum@xxxxxxxxxx> · Tue, 5 Sep 2017 14:31:09 -0700



On Mon, Aug 28, 2017 at 4:05 AM, Yuri Gorshkov <ygorshkov@xxxxxxxxxxxx> wrote:
> Hi.
>
> When trying to take down a host for maintenance purposes I encountered an
> I/O stall along with some PGs marked 'peered' unexpectedly.
>
> Cluster stats: 96/96 OSDs, healthy prior to incident, 5120 PGs, 4 hosts
> consisting of 24 OSDs each. Ceph version 11.2.0, using standard filestore
> (with LVM journals on SSD) and default crush map. All pools are size 3,
> min_size 2.
>
> Steps to reproduce the problem:
> 0. Cluster is healthy, HEALTH_OK
> 1. Set noout flag to prepare for host removal.
> 2. Begin taking OSDs on one of the hosts down: systemctl stop ceph-osd@$osd.
> 3. Notice the IO has stalled unexpectedly and about 100 PGs total are in
> degraded+undersized+peered state if the host is down.
>
> AFAIK the 'peered' state means that the PG has not been replicated to
> min_size yet, so there is something strange going on. Since we have 4 hosts
> and are using the default crush map, how is it possible that after taking
> one host (or even just some OSDs on that host) down some PGs in the cluster
> are left with less than 2 copies?
>
> Here's the snippet of 'ceph pg dump_stuck' when this happened. Sadly I don't
> have any more information yet...
>
> # ceph pg dump|grep peered
> dumped all in format plain
> 3.c80       173                  0      346       692       0   715341824
> 10041    10041 undersized+degraded+remapped+backfill_wait+peered 2017-08-02
> 19:12:39.319222  12124'104727   12409:62777 [62,76,44]         62        [2]
> 2    1642'32485 2017-07-18 22:57:06.263727        1008'135 2017-07-09
> 22:34:40.893182
> 3.204       184                  0      368       649       0   769544192
> 10065    10065 undersized+degraded+remapped+backfill_wait+peered 2017-08-02
> 19:12:39.334905   12124'13665   12409:37345  [75,52,1]         75        [2]
> 2     1375'4316 2017-07-18 00:10:27.601548       1371'2740 2017-07-12
> 07:48:34.953831
> 11.19     25525                  0    51050     78652       0 14829768529
> 10059    10059 undersized+degraded+remapped+backfill_wait+peered 2017-08-02
> 19:12:39.311612  12124'156267  12409:137128 [56,26,14]         56       [18]
> 18    1375'28148 2017-07-17 20:27:04.916079             0'0 2017-07-10
> 16:12:49.270606

Well, are those listed OSDs all on different hosts, or are they on the
same host? Kind of sounds (and look) like your CRUSH map is separating
copies across hard drives rather than across hosts. (This could happen
if you initially created your cluster with only one host or
something.)
-Greg

>
> --
> Sincerely,
> Yuri Gorshkov
> Systems Engineer
> SmartLabs LLC
> +7 (495) 645-44-46 ext. 6926
> ygorshkov@xxxxxxxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com