Re: Degraded objects afte: ceph osd in $osd

Janne Johansson <icepic.dz@xxxxxxxxx> · Mon, 26 Nov 2018 09:52:11 +0100

Den mån 26 nov. 2018 kl 09:39 skrev Stefan Kooman <stefan@xxxxxx>:

> > It is a slight mistake in reporting it in the same way as an error,
> > even if it looks to the
> > cluster just as if it was in error and needs fixing. This gives the
> > new ceph admins a
> > sense of urgency or danger whereas it should be perfectly normal to add space to
> > a cluster. Also, it could have chosen to add a fourth PG in a repl=3
> > PG and fill from
> > the one going out into the new empty PG and somehow keep itself with 3 working
> > replicas, but ceph chooses to first discard one replica, then backfill
> > into the empty
> > one, leading to this kind of "error" report.
>
> Thanks for the explanation. I agree with you that it would be more safe to
> first backfill to the new PG instead of just assuming the new OSD will
> be fine and discarding a perfectly healthy PG. We do have max_size 3 in
> the CRUSH ruleset ... I wonder if Ceph would behave differently if we
> would have max_size 4 ... to actually allow a fourth copy in the first
> place ...

I don't think the replication number is important, it's more of a choice which
PERHAPS is meant to allow you to move PGs to a new drive when the cluster is
near full, since it will clear out space lots faster if you just kill
off one unneeded
replica and starts writing to a new drive, whereas keeping all old
replicas until data is
100% ok on the new replica will make new space not appear until a large
amount of data has moved, which for large drives and large PGs might take
a very long time.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com