Re: Disk write cache - safe?

Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> · Mon, 19 Mar 2018 11:14:23 +0100



    Hi Steven,

    
    Le 16/03/2018 à 17:26, Steven Vacaroaia
      a écrit :

    
      Hi All,
        

        Can someone confirm please that, for a perfect 
          performance/safety compromise, the following would be the best
          settings  ( id 0 is SSD, id 1 is HDD )
        Alternatively, any suggestions / sharing configuration /
          advice would be greatly appreciated 
         
        Note
        server is a DELL R620 with PERC 710 , 1GB cache
        SSD is entreprise Toshiba PX05SMB040Y
        HDD is Entreprise Seagate  ST600MM0006
        

           megacli -LDGetProp  -DskCache -Lall -a0
          

          Adapter 0-VD 0(target id: 0): Disk Write Cache : Enabled
          Adapter 0-VD 1(target id: 1): Disk Write Cache : Disabled
        
      
    Sounds good to me as Toshiba PX05SMB040Y SSDs include power-loss
    protection
(https://toshiba.semicon-storage.com/eu/product/storage-products/enterprise-ssd/px05smbxxx.html)

    
          megacli -LDGetProp  -Cache -Lall -a0
          

          Adapter 0-VD 0(target id: 0): Cache Policy:WriteBack,
            ReadAdaptive, Direct, No Write Cache if bad BBU
          Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack,
            ReadAdaptive, Cached, Write Cache OK if bad BBU
        
        
    I've always wondered about ReadAdaptive with no real answer. This
    would need clarification from RHCS / Ceph performance team.

    
    With a 1GB PERC cache, my guess is that you should set SSDs to
    writethrough whatever your workload is, so that the whole cache is
    dedicated to HDDs only, and your nodes don't hit a PERC cache full
    issue that would be hard to diagnose. Besides, write caching should
    always be avoided with a bad BBU.

    
    Regards,

    
    Frédéric.

    
        Many thanks
        

        Steven
        

        On 16 March 2018 at 06:20, Frédéric
          Nass <frederic.nass@xxxxxxxxxxxxxxxx>
          wrote:

          Hi Tim,

            
            I wanted to share our experience here as we've been in a
            situation in the past (on a friday afternoon of course...)
            that injecting a snaptrim priority of 40 to all OSDs in the
            cluster (to speed up snaptimming) resulted in alls OSD nodes
            crashing at the same time, in all 3 datacenters. My first
            thought at that particular moment was : call your wife and
            tell her you'll be late home. :-D

            
            And this event was not related to a power outage.

            
            Fortunately I had spent some time (when building the
            cluster) thinking how each option should be set along the
            I/O path for #1 data consistency and #2 best possible
            performance, and that was :

            
            - Single SATA disks Raid0 with writeback PERC caching on
            each virtual disk

            - write barriers kept enabled on XFS mounts (I had measured
            a 1.5 % performance gap so disabling warriers was no good
            choice, and is never actually)

            - SATA disks write buffer disabled (as volatile)

            - SSD journal disks write buffer enabled (as persistent)

            
            We hardly believed it but when all nodes came back online,
            all OSDs rejoined the cluster and service was back as it was
            before. We didn't face any XFS errors nor did we have any
            further scrub or deep-scrub errors.

            
            My assumption was that the extra power demand for
            snaptrimimng may have led to node power instability or that
            we hit a SATA firmware or maybe a kernel bug.

            
            We also had SSDs as Raid0 with writeback PERC cache ON but
            changed that to write-through as we could get more IOPS from
            them regarding our workloads.

            
            Thanks for sharing the information about DELL changing the
            default disk buffer policy. What's odd is that it all
            buffers were disabled after the node rebooted, including
            SSDs !

            I am now changing them back to enabled for SSDs only.

            
            As said by others, you'd better keep the disks buffers
            disabled and rebuild the OSDs after setting the disks as
            Raid0 with writeback enabled.

            
            Best,

            
            Frédéric.

              
              Le 14/03/2018 à 20:42, Tim Bishop a écrit :

              
                I'm using Ceph on Ubuntu 16.04 on Dell R730xd servers. A
                recent [1]

                update to the PERC firmware disabled the disk write
                cache by default

                which made a noticable difference to the latency on my
                disks (spinning

                disks, not SSD) - by as much as a factor of 10.

                
                For reference their change list says:

                
                "Changes default value of drive cache for 6 Gbps SATA
                drive to disabled.

                This is to align with the industry for SATA drives. This
                may result in a

                performance degradation especially in non-Raid mode. You
                must perform an

                AC reboot to see existing configurations change."

                
                It's fairly straightforward to re-enable the cache
                either in the PERC

                BIOS, or by using hdparm, and doing so returns the
                latency back to what

                it was before.

                
                Checking the Ceph documentation I can see that older
                versions [2]

                recommended disabling the write cache for older kernels.
                But given I'm

                using a newer kernel, and there's no mention of this in
                the Luminous

                docs, is it safe to assume it's ok to enable the disk
                write cache now?

                
                If it makes a difference, I'm using a mixture of
                filestore and bluestore

                OSDs - migration is still ongoing.

                
                Thanks,

                
                Tim.

                
                [1] - https://www.dell.com/support/home/uk/en/ukdhs1/Drivers/DriversDetails?driverId=8WK8N

                [2] - http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/

                
                _______________________________________________

                ceph-users mailing list

                ceph-users@xxxxxxxxxxxxxx

                http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

              
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com