Re: Ceph OSD reported Slow operations

Janne Johansson <icepic.dz@xxxxxxxxx> · Fri, 3 Nov 2023 07:08:50 +0100

Den tors 2 nov. 2023 kl 23:46 skrev V A Prabha <prabhav@xxxxxxx>:
>
> Is it possible to move the OSDs safe (making the OSDs out and move the content
> to other OSDs and remove it and map it fresh to other nodes which is less
> loaded)

> As the client feels that using 3 replicas and holding these much spare storage ,
> we are not using the storage in an optimal way?

These two sentences don't really add up.

Ceph has replica=3 as a default in order for drives and hosts to be
able to crash, so that you can recover without losing redundancy. As
soon as you lose redundancy, there is no "safe" anything, you are
immediately in danger of losing data so that you can never get it back
from ceph.
If the client thinks you are wasting space, then they must not care
for the data, because ANY random hiccup, any broken sector somewhere
becomes a data-loss event if the other copy is being moved, or that
server is having maintenance or whatever. With only two copies of the
data, you can never reboot a server, upgrades means the cluster stops
serving data.

The joke in the 80s (might be older than that of course) was:
"Data is binary, either it is important and backed up, or it is not important"

Ceph chooses to treat your data as important. You can lower the
expectations by reducing replicas, or reduce perf with erasure coding
(but I understand that this whole thread is about poor total
performance of both client traffic and scrubs and so on), but the
defaults are there to protect you from any random bit flip on one of
the disks and this will happen. Not "perhaps", with enough drives
and/or enough time, this is a certainty. Your design of the storage
decides how your will survive such an event.

If you have three copies of all data, you can be doing (planned or
unplanned) maintenance on one OSD box, notice an error somewhere on
another box and still recover from this using the third copy.

With only two replicas, you can't. The maintenance OSD host will have
missed IO that went on while it was away, and the second copy is known
bad. You could hope that you never need to do maintenance, but with
few exceptions, hosts will reboot at times, whether you plan it or
not.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx