incomplete PG for erasure coding pool after OSD failure

Anton Aleksandrov <anton@xxxxxxxxxxxxxx> · Tue, 26 Jun 2018 17:12:23 +0300



    Hello,
    We have small cluster, initially on 4 hosts (1
        osd per host, 8tb each) with erasure-coding for data-pool (k=3
        m=1). 

      
    After some time I have added one more small
        host (1 osd, 2tb). Ceph has synced fine.
    Then I have powered off one of first 8tb hosts
        and terminated it. Also removed from crush map and basically
        simulating that OSD has died. But no matter what - CEPH stays in
        HEALTH_WARN state and indicate incomplete PG, reduced data availability,
        pgs inactive and incomplete and also slow requests (even though
        we are not writing there right now). 

      
    Used disk space is small, just several
        gigabytes. WIth this test scenario I would expect, that Ceph
        would recalculate missing data from removed OSD and after some
        time become healthy again.
    This did not happen automatically. Is there
        any special command for this? Is there any specific procedure to
        recalculate the data? 

      
    We are testing on Luminous and Bluestore, CephFS.
    Anton.

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com