Hi everyone, i'm currently fighting an issue in a cluster we have for a
customer. It's used for a lot of small files(113m currently) that are
pulled via radosgw. We have 3 nodes, 24 OSDs in total. the index etc
pools are migrated to a separate root called "ssd", that root is on only
ssd drives - each node has one ssd in this root. We did this because we
had an issue where if a normal OSD(an HDD) crashed, the entire rgw
stopped working. Today, one of the SSDs crashed and after changing the
drive, putting a new one in and starting recovery, RGW halted writes.
Read worked ok, but we couldn't upload any more files to it. The
non-data pools all have size set to 3, so there should still be 2
healthy copies of the index data. Also, when recovery started, no
recovery i/o was shown in the ceph -s output, so we checked it through
df, after the ssd backfilled, ceph -s went from X degraded pgs back to
OK instantly. Does anyone know how to fix these? i don't think writes
should be halted during recovery.
Thanks
Josef Z
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com