Re: List pg with heavily degraded objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/09/2021 15:54, Janne Johansson wrote:
Den fre 10 sep. 2021 kl 14:39 skrev George Shuklin <george.shuklin@xxxxxxxxx>:
On 10/09/2021 14:49, George Shuklin wrote:
Hello.

I wonder if there is a way to see how many replicas are available for
each object (or, at least, PG-level statistics). Basically, if I have
damaged cluster, I want to see the scale of damage, and I want to see
the most degraded objects (which has 1 copy, then objects with 2
copies, etc).

Are there a way? pg list is not very informative, as it does not show
how badly 'unreplicated' data are.

Actually, the problem is more complicated than I expected. Here is the
artificial cluster, where there is a sizable chunk of data are single,
(cluster of thee servers with 2 OSD each, put some data, shutdown server
#1, put some more data, kill server #3, start server#1, it's guaranteed
that server #2 holds a single copy). This is snapshot of the ceph pg
dump for it as soon as #2 booted, and I can't find a proof that some
data are in a single copy:
https://gist.github.com/amarao/fbc8ef3538f66a9f2c264f8555f5c29a

In this case, where you have both made PGs undersized, and also degraded
by letting one OSD pick up some changes and then remove it and get another
one back in (I didn't see where #2 stopped in your example), I guess you will
have to take a deep dive into
ceph pg <name of pg> query to see ALL the info about it.

By the time you are stacking multiple error scenarios on top of eachother,
I don't think there is a simple "show me a short understandable list of what
it almost near working"


No, I'm worried about observability of the situation when data are in a single copy (which I consider bit emergency). I've just created scenario when only single server (2 OSD) got data on it, and right after replication started, I can't detect that it's THAT bad. I've updated the gist: https://gist.github.com/amarao/fbc8ef3538f66a9f2c264f8555f5c29a with snapshot after cluster with single copy of data available found enough space to make all PG 'well-sized'. Replication is underway, but data are single-copy at the moment of snapshot.


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux