Re: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



For me, it was the .rgw.meta pool that had very dense placement groups. The OSDs would fail to start and would then commit suicide while trying to scan the PGs. We had to remove all references of those placement groups just to get the OSDs to start. It wasn't pretty.


On Mon, Aug 19, 2019, 2:09 AM Troy Ablan <tablan@xxxxxxxxx> wrote:
Yes, it's possible that they do, but since all of the affected OSDs are
still down and the monitors have been restarted since, all of those
pools have pgs that are in unknown state and don't return anything in
ceph pg ls.

There weren't that many placement groups for the SSDs, but also I don't
know that there were that many objects.  There were of course a ton of
omap key/values.

-Troy

On 8/18/19 10:57 PM, Brett Chancellor wrote:
> This sounds familiar. Do any of these pools on the SSD have fairly dense
> placement group to object ratios? Like more than 500k objects per pg?
> (ceph pg ls)
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux