forgot to mention - we are running jewel, 10.2.10
On 26/03/18 11:30, Josef Zelenka wrote:
Hi everyone, i'm currently fighting an issue in a cluster we have for
a customer. It's used for a lot of small files(113m currently) that
are pulled via radosgw. We have 3 nodes, 24 OSDs in total. the index
etc pools are migrated to a separate root called "ssd", that root is
on only ssd drives - each node has one ssd in this root. We did this
because we had an issue where if a normal OSD(an HDD) crashed, the
entire rgw stopped working. Today, one of the SSDs crashed and after
changing the drive, putting a new one in and starting recovery, RGW
halted writes. Read worked ok, but we couldn't upload any more files
to it. The non-data pools all have size set to 3, so there should
still be 2 healthy copies of the index data. Also, when recovery
started, no recovery i/o was shown in the ceph -s output, so we
checked it through df, after the ssd backfilled, ceph -s went from X
degraded pgs back to OK instantly. Does anyone know how to fix these?
i don't think writes should be halted during recovery.
Thanks
Josef Z
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com