In theory it should be possible to do this (to change the Block SSD Write Disk Cache Change = Yes setting) 1. Run MegaSCU -adpsettings -write -f mfc.ini -a0 2. Edit the mfc.ini file, setting "blockSSDWriteCacheChange" to 0 instead of 1. 3. Run MegaSCU -adpsettings -read -f mfc.ini -a0 With MEGACLI but I get en error. It is not working for me to save the config file. No idea why… My Config: VD16 Properties : =============== Strip Size = 256 KB Number of Blocks = 3749642240 VD has Emulated PD = Yes Span Depth = 1 Number of Drives Per Span = 1 Write Cache(initial setting) = WriteThrough Disk Cache Policy = Enabled Encryption = None Data Protection = Disabled Active Operations = None Exposed to OS = Yes Creation Date = 25-08-2020 Creation Time = 12:05:41 PM Emulation type = None Version : ======= Firmware Package Build = 23.28.0-0010 Firmware Version = 3.400.05-3175 Bios Version = 5.46.02.0_4.16.08.00_0x06060900 NVDATA Version = 2.1403.03-0128 Boot Block Version = 2.05.00.00-0010 Bootloader Version = 07.26.26.219 Driver Name = megaraid_sas Driver Version = 07.703.05.00-rc1 Supported Adapter Operations : ============================ Rebuild Rate = Yes CC Rate = Yes BGI Rate = Yes Reconstruct Rate = Yes Patrol Read Rate = Yes Alarm Control = Yes Cluster Support = No BBU = Yes Spanning = Yes Dedicated Hot Spare = Yes Revertible Hot Spares = Yes Foreign Config Import = Yes Self Diagnostic = Yes Allow Mixed Redundancy on Array = No Global Hot Spares = Yes Deny SCSI Passthrough = No Deny SMP Passthrough = No Deny STP Passthrough = No Support more than 8 Phys = Yes FW and Event Time in GMT = No Support Enhanced Foreign Import = Yes Support Enclosure Enumeration = Yes Support Allowed Operations = Yes Abort CC on Error = Yes Support Multipath = Yes Support Odd & Even Drive count in RAID1E = No Support Security = No Support Config Page Model = Yes Support the OCE without adding drives = Yes support EKM = No Snapshot Enabled = No Support PFK = Yes Support PI = Yes Support LDPI Type1 = No Support LDPI Type2 = No Support LDPI Type3 = No Support Ld BBM Info = No Support Shield State = Yes Block SSD Write Disk Cache Change = Yes -> this is not good as it prevents to change the SSD cache! Stupid! Support Suspend Resume BG ops = Yes Support Emergency Spares = Yes Support Set Link Speed = Yes Support Boot Time PFK Change = No Support JBOD = Yes Disable Online PFK Change = No Support Perf Tuning = Yes Support SSD PatrolRead = Yes Real Time Scheduler = Yes Support Reset Now = Yes Support Emulated Drives = Yes Headless Mode = Yes Dedicated HotSpares Limited = No Point In Time Progress = Yes Supported VD Operations : ======================= Read Policy = Yes Write Policy = Yes IO Policy = Yes Access Policy = Yes Disk Cache Policy = Yes (but only HDD’s in this case) Reconstruction = Yes Deny Locate = No Deny CC = No Allow Ctrl Encryption = No Enable LDBBM = No Support FastPath = Yes Performance Metrics = Yes Power Savings = No Support Powersave Max With Cache = No Support Breakmirror = No Support SSC WriteBack = No Support SSC Association = No Von: Reed Dier <reed.dier@xxxxxxxxxxx> Gesendet: Mittwoch, 02. September 2020 19:34 An: VELARTIS Philipp Dürhammer <p.duerhammer@xxxxxxxxxxx> Cc: ceph-users@xxxxxxx Betreff: Re: Can 16 server grade ssd's be slower then 60 hdds? (no extra journals) Just for the sake of curiosity, if you do a show all on /cX/vX, what is shown for the VD properties? VD0 Properties : ============== Strip Size = 256 KB Number of Blocks = 1953374208 VD has Emulated PD = No Span Depth = 1 Number of Drives Per Span = 1 Write Cache(initial setting) = WriteBack Disk Cache Policy = Disabled Encryption = None Data Protection = Disabled Active Operations = None Exposed to OS = Yes Creation Date = 17-06-2016 Creation Time = 02:49:02 PM Emulation type = default Cachebypass size = Cachebypass-64k Cachebypass Mode = Cachebypass Intelligent Is LD Ready for OS Requests = Yes SCSI NAA Id = 600304801bb4c0001ef6ca5ea0fcb283 I'm wondering if the pdcache value must be set at vd creation, as it is a creation option as well. If that's the case, maybe consider blowing away one of the SSD vd's and recreating the vd and OSD, and see if you can measure a difference on that disk specifically in testing. It might also be helpful to document some of these values from /cX show all Version : ======= Firmware Package Build = 24.7.0-0026 Firmware Version = 4.270.00-3972 Bios Version = 6.22.03.0_4.16.08.00_0x060B0200 Ctrl-R Version = 5.08-0006 Preboot CLI Version = 01.07-05:#%0000 NVDATA Version = 3.1411.00-0009 Boot Block Version = 3.06.00.00-0001 Driver Name = megaraid_sas Driver Version = 07.703.05.00-rc1 Supported Adapter Operations : ============================ Support Shield State = Yes Block SSD Write Disk Cache Change = Yes Support Suspend Resume BG ops = Yes Support Emergency Spares = Yes Support Set Link Speed = Yes Support Boot Time PFK Change = No Support JBOD = Yes Supported VD Operations : ======================= Read Policy = Yes Write Policy = Yes IO Policy = Yes Access Policy = Yes Disk Cache Policy = Yes Reconstruction = Yes Deny Locate = No Deny CC = No Allow Ctrl Encryption = No Enable LDBBM = No Support FastPath = Yes Performance Metrics = Yes Power Savings = No Support Powersave Max With Cache = No Support Breakmirror = Yes Support SSC WriteBack = No Support SSC Association = No Support VD Hide = Yes Support VD Cachebypass = Yes Support VD discardCacheDuringLDDelete = Yes Advanced Software Option : ======================== ---------------------------------------- Adv S/W Opt Time Remaining Mode ---------------------------------------- MegaRAID FastPath Unlimited - MegaRAID RAID6 Unlimited - MegaRAID RAID5 Unlimited - ---------------------------------------- Namely, on my 3108 controller, Block SSD Write Disk Cache Change = Yes, stands out to me. My controller has SAS HDD's behind it, though so I just may not be running into the same issue, that may pertain to me. Also wondering if FastPath is enabled as well. I know on some of the older controllers, it was a paid feature enable, but they then opened it up for free, though you may need a software key to enable it (for free). Just looking to widen the net and hope we catch something. Reed On Sep 2, 2020, at 7:38 AM, VELARTIS Philipp Dürhammer <p.duerhammer@xxxxxxxxxxx<mailto:p.duerhammer@xxxxxxxxxxx>> wrote: I assume you are referencing this parameter? storcli /c0/v0 set ssdcaching=<on|off> If so, this is for CacheCade, which is LSI's cache tiering solution, which should both be off and not in use for ceph. No storcli /cx/vx set pdcache=off is denied because of the lsi setting "Block SSD Write Disk Cache Change = Yes" I cannot find any firmware to upload or way to change this Do you think that disabling the write cache also on the ssd helps a lot (ceph is not aware of this because 'smartctl -g wcache /dev/sdX shows cache disabled - because the cache on the lsi is disabled allready) The only way would be to buy some hba cards and add it to the server. But that’s a lot of work - not knowing that this will improve the speed a lot. I am using rbd with hyperconvergenced nodes (4 at the moment) pools are 2 and 3 times replicated. actually the performance for windows and linux vms with the hdd osd pool was ok. But with the time getting a little bit more slow. I just want to get ready for the future. and we plan to put some bigger database servers on the cluster (they are on local storage at the moment) and therefore I want to increase the random small iops of the cluster a lot -----Ursprüngliche Nachricht----- Von: Reed Dier <reed.dier@xxxxxxxxxxx<mailto:reed.dier@xxxxxxxxxxx>> Gesendet: Dienstag, 01. September 2020 23:44 An: VELARTIS Philipp Dürhammer <p.duerhammer@xxxxxxxxxxx<mailto:p.duerhammer@xxxxxxxxxxx>> Cc: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> Betreff: Re: Can 16 server grade ssd's be slower then 60 hdds? (no extra journals) there is an option set in the controller "Block SSD Write Disk Cache Change = Yes" which does not permit to deactivate the ssd cache. I could not find any solution in google for this controller (LSI MegaRAID SAS 9271-8i) to change this setting. I assume you are referencing this parameter? storcli /c0/v0 set ssdcaching=<on|off> If so, this is for CacheCade, which is LSI's cache tiering solution, which should both be off and not in use for ceph. Single thread and single iodepth benchmarks will tend to be underwhelming. Ceph shines with aggregate performance from lots of clients. And in an odd twist of fate, I typically see better performance on RBD for random benchmarks rather than sequential benchmarks, as it distributes the load across more OSD's. Might also help others offer some pointers for tuning if you describe the pool/application a bit more. Ie RBD vs cephfs vs RGW, 3x replicated vs EC, etc. At least things are trending in a positive direction. Reed On Sep 1, 2020, at 4:21 PM, VELARTIS Philipp Dürhammer <p.duerhammer@xxxxxxxxxxx<mailto:p.duerhammer@xxxxxxxxxxx>> wrote: Thank you. I was working in this direction. The situation is a lot better. But I think I can get still far better. I could set the controller to writethrough, direct and no read ahead for the ssds. But I cannot disable the pdcache ☹ there is an option set in the controller "Block SSD Write Disk Cache Change = Yes" which does not permit to deactivate the ssd cache. I could not find any solution in google for this controller (LSI MegaRAID SAS 9271-8i) to change this setting. I don’t know how much performance gain it will be to deactivate the ssd cache. At least the micron 5200max has capacitor so I hope it is safe for data loss in case if power failure. I wrote a request to lsi / Broadcom if they know how I can change this setting. This is really annyoing. I will check the cpu power settings. I rode also somewhere it can improve iops a lot. (if its bad set) At the moment I get 600iops 4k random write 1 thread and 1 iodepth. I get 40K - 4k random iops for some instances with 32iodepth. Its not the world but a lot better then before. Read around 100k iops. For 16 ssd's and 2 x dual 10G nic. I was reading that good tunings and hardware config can get more then 2000 iops on single thread out of the ssds. I know thet ceph does not shine with single thread. But 600 iops is not very much... philipp -----Ursprüngliche Nachricht----- Von: Reed Dier <reed.dier@xxxxxxxxxxx<mailto:reed.dier@xxxxxxxxxxx>> Gesendet: Dienstag, 01. September 2020 22:37 An: VELARTIS Philipp Dürhammer <p.duerhammer@xxxxxxxxxxx<mailto:p.duerhammer@xxxxxxxxxxx>> Cc: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> Betreff: Re: Can 16 server grade ssd's be slower then 60 hdds? (no extra journals) If using storcli/perccli for manipulating the LSI controller, you can disable the on-disk write cache with: storcli /cx/vx set pdcache=off You can also ensure that you turn off write caching at the controller level with storcli /cx/vx set iopolicy=direct storcli /cx/vx set wrcache=wt You can also tweak the readahead value for the vd if you want, though with an ssd, I don't think it will be much of an issue. storcli /cx/vx set rdcache=nora I'm sure the megacli alternatives are available with some quick searches. May also want to check your c-states and p-states to make sure there isn't any aggressive power saving features getting in the way. Reed On Aug 31, 2020, at 7:44 AM, VELARTIS Philipp Dürhammer <p.duerhammer@xxxxxxxxxxx<mailto:p.duerhammer@xxxxxxxxxxx>> wrote: We have older LSi Raid controller with no HBA/JBOD option. So we expose the single disks as raid0 devices. Ceph should not be aware of cache status? But digging deeper in to it it seems that 1 out of 4 serves is performing a lot better and has super low commit/applay rates while the other have a lot mor (20+) on heavy writes. This just applys fore the ssd. For the hdds I cant see a difference... -----Ursprüngliche Nachricht----- Von: Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> Gesendet: Montag, 31. August 2020 13:19 An: VELARTIS Philipp Dürhammer <p.duerhammer@xxxxxxxxxxx<mailto:p.duerhammer@xxxxxxxxxxx>>; 'ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>' <ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>> Betreff: Re: Can 16 server grade ssd's be slower then 60 hdds? (no extra journals) Yes, they can - if volatile write cache is not disabled. There are many threads on this, also recent. Search for "disable write cache" and/or "disable volatile write cache". You will also find different methods of doing this automatically. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: VELARTIS Philipp Dürhammer <p.duerhammer@xxxxxxxxxxx<mailto:p.duerhammer@xxxxxxxxxxx>> Sent: 31 August 2020 13:02:45 To: 'ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>' Subject: Can 16 server grade ssd's be slower then 60 hdds? (no extra journals) I have a productive 60 osd's cluster. No extra Journals. Its performing okay. Now I added an extra ssd Pool with 16 Micron 5100 MAX. And the performance is little slower or equal to the 60 hdd pool. 4K random as also sequential reads. All on dedicated 2 times 10G Network. HDDS are still on filestore. SSD on bluestore. Ceph Luminous. What should be possible 16 ssd's vs. 60 hhd's no extra journals? _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx