Hi Sean, My use of EC is specifically for slow, bulk storage. I did test jerasure some years ago, but I don't think I kept my results. I'm having issues today with arxiv.org which had papers… I wanted to reduce disk usage primarily, and network IO secondarily. In my case, I preferred the reduced disk i/o of CLAY. I recall running a bunch of scenarios for specific values of k and m in small clusters. https://tracker.ceph.com/projects/ceph/wiki/Shingled_Erasure_Code_(SHEC) <-- I did not compare https://docs.ceph.com/en/quincy/rados/operations/erasure-code-clay/ <-- has a comparison with LRC In actual practice, I have no problems running a variety of interactive services on it, so I ended up using it for cephfs. I use simple replicas for IOPS-sensitive applications. d=5 k=4 m=2 plugin=clay This is about as small as is practical. I'm using OSD failure domain due to the physical layout of osds per node (some larger, some smaller). In practice this increases the likelihood of data going offline due to a host failure, but it is an acceptable level of risk for this application. My $.02, Jeremy On Wed, Nov 16, 2022 at 6:47 PM Sean Matheny <sean.matheny@xxxxxxxxxxx> wrote: > Hi Jeremy, > > Thanks for the feedback, and good to know that clay has been stable for > you. Would you mind sharing what your motivation was going with clay? Was > it for the recovery tail performance of clay versus jerasure, or some other > reason(s)? Did you happen to do any benchmarking of clay vs erasure (either > in normal write and read, or in recovery scenarios)? > > Ngā mihi, > > Sean Matheny > HPC Cloud Platform DevOps Lead > New Zealand eScience Infrastructure (NeSI) > > e: sean.matheny@xxxxxxxxxxx > > On 12/11/2022, at 9:43 AM, Jeremy Austin <jhaustin@xxxxxxxxx> wrote: > > I'm running 16.2.9 and have been using clay for 3 or 4 years. I can't > speak to your scale, but I have had no long term reliability problems at > small scale, including one or two hard power-down scenarios. (Alaska power > is not too great! Not so much a grid as a very short stepladder.) > > On Thu, Oct 20, 2022 at 12:05 PM Sean Matheny <sean.matheny@xxxxxxxxxxx> > wrote: > >> HI all, >> >> We've deployed a new cluster on Quincy 17.2.3 with 260x 18TB spinners >> across 11 chassis that will be used exclusively in the next year or so as a >> S3 store. 100Gb per chassis shared by both cluster and public networks, >> NVMe DB/WAL, 32 phys cores @ 2.3Ghz base, 192GB chassis ram (per 24 OSDs). >> >> We're looking to use the clay ec plugin for our rgw (data) pool, as it >> appears to use less reads in recovery, and might be beneficial. I'm going >> to be benchmarking recovery scenarios ahead of production, but that of >> course doesn't give a view on longer-term reliability. :) Anyone hear of >> any bad experiences, or any reason not to use over jerasure? Any reason to >> use cauchy-good instead of reed-solomon for the use case above? >> >> >> Ngā mihi, >> >> Sean Matheny >> HPC Cloud Platform DevOps Lead >> New Zealand eScience Infrastructure (NeSI) >> >> e: sean.matheny@xxxxxxxxxxx >> >> >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > > > -- > Jeremy Austin > jhaustin@xxxxxxxxx > > > -- Jeremy Austin jhaustin@xxxxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx