Hi I'm very interested in this calculation. What assumption do you have done? Network speed, osd degree of fulfilment, etc....? Thanks Wolfgang On 12/07/2016 11:16 AM, Дмитрий Глушенок wrote: > Hi, > > Let me add a little math to your warning: with LSE rate of 1 in 10^15 on > modern 8 TB disks there is 5,8% chance to hit LSE during recovery of 8 > TB disk. So, every 18th recovery will probably fail. Similarly to RAID6 > (two parity disks) size=3 mitigates the problem. > By the way - why it is a common opinion that using RAID (RAID6) with > Ceph (size=2) is bad idea? It is cheaper than size=3, all hardware disk > errors are handled by RAID (instead of OS/Ceph), decreases OSD count, > adds some battery-backed cache and increases performance of single OSD. > >> 7 дек. 2016 г., в 11:08, Wido den Hollander <wido@xxxxxxxx >> <mailto:wido@xxxxxxxx>> написал(а): >> >> Hi, >> >> As a Ceph consultant I get numerous calls throughout the year to help >> people with getting their broken Ceph clusters back online. >> >> The causes of downtime vary vastly, but one of the biggest causes is >> that people use replication 2x. size = 2, min_size = 1. >> >> In 2016 the amount of cases I have where data was lost due to these >> settings grew exponentially. >> >> Usually a disk failed, recovery kicks in and while recovery is >> happening a second disk fails. Causing PGs to become incomplete. >> >> There have been to many times where I had to use xfs_repair on broken >> disks and use ceph-objectstore-tool to export/import PGs. >> >> I really don't like these cases, mainly because they can be prevented >> easily by using size = 3 and min_size = 2 for all pools. >> >> With size = 2 you go into the danger zone as soon as a single >> disk/daemon fails. With size = 3 you always have two additional copies >> left thus keeping your data safe(r). >> >> If you are running CephFS, at least consider running the 'metadata' >> pool with size = 3 to keep the MDS happy. >> >> Please, let this be a big warning to everybody who is running with >> size = 2. The downtime and problems caused by missing objects/replicas >> are usually big and it takes days to recover from those. But very >> often data is lost and/or corrupted which causes even more problems. >> >> I can't stress this enough. Running with size = 2 in production is a >> SERIOUS hazard and should not be done imho. >> >> To anyone out there running with size = 2, please reconsider this! >> >> Thanks, >> >> Wido >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- > Dmitry Glushenok > Jet Infosystems > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com