Hi, Le 02/10/2015 18:15, Christian Balzer a écrit : > Hello, > On Fri, 2 Oct 2015 15:31:11 +0200 Javier C.A. wrote: > > Firstly, this has been discussed countless times here. > For one of the latest recurrences, check the archive for: > > "calculating maximum number of disk and node failure that can > be handled by cluster with out data loss" > > >> A classic raid5 system takes a looong time to rebullid the raid, so i >> would say NO, but how long does it take for ceph to rebullid the >> placement group? >> > A placement group resides on an OSD. > Until the LAST PG on a failed OSD has been recovered, you are prone to > data loss. > And a single lost PG might affect ALL your images... True. > > So while your OSDs are mostly empty, recovery will be faster than a RAID5. > > Once it gets fuller AND you realize that rebuilding OSDs SEVERELY impacts > your cluster performance (at least in your smallish example) you are > likely to tune down the recovery and backfill parameters to a level where > it takes LONGER than a typical RAID controller recovery. No, it doesn't. At least it shouldn't: in a RAID5 array, you need to read all blocks from all the other devices to build the data on your replacement device. To rebuild an OSD, you only have to read the amount of data you will store on the replacement device, which is <n-1> times less reads and as much writes than what would happen with RAID5. This is more easily compared to what would happen with a RAID10 array. But if you care about redundancy more than minimizing the total amount of IO pressure linked to balancing the cluster you won't rebuild the OSD but let the failed one go out and data be reorganized in addition to the missing replica reconstruction. In this case you will distribute *both* the reads and writes on all devices. Pgs will be moved around which will add some read/write load on the cluster (this is why this will put more IO pressure overall). One of the jobs of the CRUSH algorithm is to minimize the amount of such movements. That said even if there are additional movements they don't help with redundancy, the only process important for redundancy is the replica being rebuilt for each pg in degraded state which should be far faster than what RAID5 allows (if Ceph prioritizes backfills and recoveries moving pgs from degraded to clean, which I suppose it does but can't find a reference for, then replace "should be far faster" by "is far faster"). Best regards, Lionel _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com