On Mon, Aug 28, 2017 at 4:05 AM, Yuri Gorshkov <ygorshkov@xxxxxxxxxxxx> wrote: > Hi. > > When trying to take down a host for maintenance purposes I encountered an > I/O stall along with some PGs marked 'peered' unexpectedly. > > Cluster stats: 96/96 OSDs, healthy prior to incident, 5120 PGs, 4 hosts > consisting of 24 OSDs each. Ceph version 11.2.0, using standard filestore > (with LVM journals on SSD) and default crush map. All pools are size 3, > min_size 2. > > Steps to reproduce the problem: > 0. Cluster is healthy, HEALTH_OK > 1. Set noout flag to prepare for host removal. > 2. Begin taking OSDs on one of the hosts down: systemctl stop ceph-osd@$osd. > 3. Notice the IO has stalled unexpectedly and about 100 PGs total are in > degraded+undersized+peered state if the host is down. > > AFAIK the 'peered' state means that the PG has not been replicated to > min_size yet, so there is something strange going on. Since we have 4 hosts > and are using the default crush map, how is it possible that after taking > one host (or even just some OSDs on that host) down some PGs in the cluster > are left with less than 2 copies? > > Here's the snippet of 'ceph pg dump_stuck' when this happened. Sadly I don't > have any more information yet... > > # ceph pg dump|grep peered > dumped all in format plain > 3.c80 173 0 346 692 0 715341824 > 10041 10041 undersized+degraded+remapped+backfill_wait+peered 2017-08-02 > 19:12:39.319222 12124'104727 12409:62777 [62,76,44] 62 [2] > 2 1642'32485 2017-07-18 22:57:06.263727 1008'135 2017-07-09 > 22:34:40.893182 > 3.204 184 0 368 649 0 769544192 > 10065 10065 undersized+degraded+remapped+backfill_wait+peered 2017-08-02 > 19:12:39.334905 12124'13665 12409:37345 [75,52,1] 75 [2] > 2 1375'4316 2017-07-18 00:10:27.601548 1371'2740 2017-07-12 > 07:48:34.953831 > 11.19 25525 0 51050 78652 0 14829768529 > 10059 10059 undersized+degraded+remapped+backfill_wait+peered 2017-08-02 > 19:12:39.311612 12124'156267 12409:137128 [56,26,14] 56 [18] > 18 1375'28148 2017-07-17 20:27:04.916079 0'0 2017-07-10 > 16:12:49.270606 Well, are those listed OSDs all on different hosts, or are they on the same host? Kind of sounds (and look) like your CRUSH map is separating copies across hard drives rather than across hosts. (This could happen if you initially created your cluster with only one host or something.) -Greg > > -- > Sincerely, > Yuri Gorshkov > Systems Engineer > SmartLabs LLC > +7 (495) 645-44-46 ext. 6926 > ygorshkov@xxxxxxxxxxxx > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com