Hi, > > Matthias suggest to enable write cache, you suggest to disble it... or i'm > > cache-confused?! ;-) there were some discussions about write cache settings last year, e.g. https://www.spinics.net/lists/ceph-users/msg73263.html https://www.spinics.net/lists/ceph-users/msg69489.html The entire threads are worth reading, but it is easy to get throroghly confused. In the first linked mail, Dan van der Ster points to this page: https://docs.ceph.com/en/latest/start/hardware-recommendations/#write-caches Essentially as Frank Schilder said: fio-test the drives with different settings, as some perform better when write-back cache is disabled, and some perform better with write-back cache enabled. And in order to not lose all of your data at once in case of a power outage: make sure your drives are not el-cheapo consumer ones which might lie to you about write completion (on either flushes or write-through writes). If a drive is not advertised for DC (data center) use, most probably it is such a dangerous consumer-grade drive and you better stay away from it. In case of power loss, drives with "power loss prevention" (large capacitor) do have a little more time to persist in-cache data to stable storage. Such drives may safely signal back write completion _before_ data has actually reached flash chips (or platters), their capacitor allows them to still persist the cache data on power loss. This gives them more opportunity for internal optimizations without risking durability, and could increase IOPS very much. Of course, individual testing is still necessary to find optimal cache settings for each drive. Matthias On Mon, Apr 17, 2023 at 07:32:42AM +0000, Frank Schilder wrote: > Hi Marco. > > >> For your disk type I saw "volatile write cache available = yes" on "the internet". This looks a bit odd, but maybe these HDDs do have some volatile cache. Try to disable it with smartctl and do the benchmark again. > > > > Sorry, i'm a bit puzzled here. > > > > Matthias suggest to enable write cache, you suggest to disble it... or i'm > > cache-confused?! ;-) > > You need to disable *volatile* write cache. In order to figure out what is volatile or not, the best is to start disabling everything. This is usually the best way to run ceph anyways. Some expensive controllers claim to have non-volatile write cache. This is usually activated by disabling cache (yes, I know how this sounds), because this option typically refers to volatile and not the non-volatile cache. > > I would start by getting the disks into non-raid mode and disable all caches I can find. Then benchmark again. Then, maybe, enable caches again one by one and see if anything improves. If not or if it gets worse, disable it again. > > Best regards, > Frank > > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Konstantin Shalygin <k0ste@xxxxxxxx> > Sent: Saturday, April 15, 2023 11:49 AM > To: Marco Gaiarin > Cc: Frank Schilder; ceph-users@xxxxxxx > Subject: Re: Re: Some hint for a DELL PowerEdge T440/PERC H750 Controller... > > Hi, > > Current controller mode is RAID. You can switch to HBA mode and disable cache in controller settings at the BIOS > > > k > Sent from my iPhone > > > On 15 Apr 2023, at 12:11, Marco Gaiarin <gaio@xxxxxxxxxxxxxxxxx> wrote: > > > > Mandi! Frank Schilder > > In chel di` si favelave... > > > >>> iops : min= 2, max= 40, avg=21.13, stdev= 6.10, samples=929 > >>> iops : min= 2, max= 42, avg=21.52, stdev= 6.56, samples=926 > >> That looks horrible. > > > > Exactly, horrible. > > > > The strange thing is that we came from an homegrown Ceph cluster built using > > old hardware (HP G6 servers) and spare disks, that perform 'better', or at > > least perform 'more uniformly' that this 'new'. > > > > With these embarassing IOPS, sooner or later we reach the point where > > performances go to the ground, and sometimes it suffices to launch some > > 'find' on the filesystem involved... > > > > > >> We also have a few SATA HDDs in Dell servers and they do about 100-150 IOP/s read or write. Originally, I was also a bit afraid that these disks would drag performance down, but they are on par with the NL-SAS drives. > >> For ceph we use the cheapest Dell disk controller one can get (Dell HBA330 Mini (Embedded)) and it works perfectly. All ceph-disks are configured non-raid, which is equivalent to JBOD mode or pass-through. These controllers have no cache options, if your do, disable all of them. Mode should be write-through. > >> For your disk type I saw "volatile write cache available = yes" on "the internet". This looks a bit odd, but maybe these HDDs do have some volatile cache. Try to disable it with smartctl and do the benchmark again. > > > > Sorry, i'm a bit puzzled here. > > > > Matthias suggest to enable write cache, you suggest to disble it... or i'm > > cache-confused?! ;-) > > > > > > My actually controller configuration is: > > > > root@pppve1:~# perccli /c0 show > > Generating detailed summary of the adapter, it may take a while to complete. > > > > CLI Version = 007.1910.0000.0000 Oct 08, 2021 > > Operating system = Linux 5.4.203-1-pve > > Controller = 0 > > Status = Success > > Description = None > > > > Product Name = PERC H750 Adapter > > Serial Number = 23L01Y6 > > SAS Address = 5f4ee0802ba3a400 > > PCI Address = 00:b3:00:00 > > System Time = 04/14/2023 18:03:24 > > Mfg. Date = 03/25/22 > > Controller Time = 04/14/2023 16:03:22 > > FW Package Build = 52.16.1-4405 > > BIOS Version = 7.16.00.0_0x07100501 > > FW Version = 5.160.02-3552 > > Driver Name = megaraid_sas > > Driver Version = 07.713.01.00-rc1 > > Current Personality = RAID-Mode > > Vendor Id = 0x1000 > > Device Id = 0x10E2 > > SubVendor Id = 0x1028 > > SubDevice Id = 0x2176 > > Host Interface = PCI-E > > Device Interface = SAS-12G > > Bus Number = 179 > > Device Number = 0 > > Function Number = 0 > > Domain ID = 0 > > Security Protocol = None > > JBOD Drives = 6 > > > > JBOD LIST : > > ========= > > > > --------------------------------------------------------------------------------- > > ID EID:Slt DID State Intf Med Size SeSz Model Vendor Port > > --------------------------------------------------------------------------------- > > 0 64:0 6 Onln SATA SSD 447.130 GB 512B MTFDDAK480TDT ATA x1 > > 1 64:1 8 Onln SATA SSD 447.130 GB 512B MTFDDAK480TDT ATA x1 > > 3 64:3 7 Onln SATA SSD 447.130 GB 512B MZ7KH480HAHQ0D3 ATA x1 > > 5 64:5 9 Onln SATA HDD 3.638 TB 512B HGST HUS726T4TALA6L0 ATA x1 > > 6 64:6 10 Onln SATA HDD 3.638 TB 512B HGST HUS726T4TALA6L0 ATA x1 > > 7 64:7 11 Onln SATA HDD 3.638 TB 512B HGST HUS726T4TALA6L0 ATA x1 > > --------------------------------------------------------------------------------- > > > > ID=JBOD Target ID|EID=Enclosure Device ID|Slt=Slot No|DID=Device ID|Onln=Online > > Offln=Offline|Intf=Interface|Med=Media Type|SeSz=Sector Size > > SED=Self Encryptive Drive|PI=Protection Info|Sp=Spun|U=Up|D=Down > > > > Physical Drives = 6 > > > > PD LIST : > > ======= > > > > ---------------------------------------------------------------------------------- > > EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type > > ---------------------------------------------------------------------------------- > > 64:0 6 Onln - 447.130 GB SATA SSD N N 512B MTFDDAK480TDT U JBOD > > 64:1 8 Onln - 447.130 GB SATA SSD N N 512B MTFDDAK480TDT U JBOD > > 64:3 7 Onln - 447.130 GB SATA SSD N N 512B MZ7KH480HAHQ0D3 U JBOD > > 64:5 9 Onln - 3.638 TB SATA HDD N N 512B HGST HUS726T4TALA6L0 U JBOD > > 64:6 10 Onln - 3.638 TB SATA HDD N N 512B HGST HUS726T4TALA6L0 U JBOD > > 64:7 11 Onln - 3.638 TB SATA HDD N N 512B HGST HUS726T4TALA6L0 U JBOD > > ---------------------------------------------------------------------------------- > > > > EID=Enclosure Device ID|Slt=Slot No|DID=Device ID|DG=DriveGroup > > DHS=Dedicated Hot Spare|UGood=Unconfigured Good|GHS=Global Hotspare > > UBad=Unconfigured Bad|Sntze=Sanitize|Onln=Online|Offln=Offline|Intf=Interface > > Med=Media Type|SED=Self Encryptive Drive|PI=Protection Info > > SeSz=Sector Size|Sp=Spun|U=Up|D=Down|T=Transition|F=Foreign > > UGUnsp=UGood Unsupported|UGShld=UGood shielded|HSPShld=Hotspare shielded > > CFShld=Configured shielded|Cpybck=CopyBack|CBShld=Copyback Shielded > > UBUnsp=UBad Unsupported|Rbld=Rebuild > > > > Enclosures = 1 > > > > Enclosure LIST : > > ============== > > > > -------------------------------------------------------------------- > > EID State Slots PD PS Fans TSs Alms SIM Port# ProdID VendorSpecific > > -------------------------------------------------------------------- > > 64 OK 8 6 0 0 0 0 0 - BP14G+ ? > > -------------------------------------------------------------------- > > > > EID=Enclosure Device ID | PD=Physical drive count | PS=Power Supply count > > TSs=Temperature sensor count | Alms=Alarm count | SIM=SIM Count | ProdID=Product ID > > > > > > BBU_Info : > > ======== > > > > ---------------------------------------------- > > Model State RetentionTime Temp Mode MfgDate > > ---------------------------------------------- > > BBU Optimal 0 hour(s) 48C - 0/00/00 > > ---------------------------------------------- > > > > > > So BBU seems enabled. > > > > -- > > Latex fetish implies the existence of a MS Word fetish. > > (@nucleus) > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx