Re: Disk write cache - safe?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tim,

Enabling the drive write cache is a recipe for disaster.  In the event of a power interruption, you have in-flight data that is stored in the cache and uncommitted to the disk media itself.  Being that the power is interrupted and the drive cache does not have a battery or supercap to keep it powered, you end up losing the data in the cache.  Now, if this is just a single node and you have size=3 or a decent EC scheme in place, Ceph should be able to recover and keep going.  However, if it is more than 1 node that loses power, you start running the risk of corrupting multiple or dare I say *all* copies of the data that was supposed to be written, with the result being data loss.  This is why is it the standard practice to disable drive caches, not just with Ceph, but with any enterprise storage offering.

In testing that I've done, using a battery backed cache on the RAID controller with each drive as it's own RAID-0 has positive performance results.  This is something to try and see if you can regain some of the performance, but as always in storage, YMMV.

David Byte
Sr. Technology Strategist
SCE Enterprise Linux 
SCE Enterprise Storage
Alliances and SUSE Embedded
dbyte@xxxxxxxx
918.528.4422
On 3/14/18, 2:43 PM, "ceph-users on behalf of Tim Bishop" <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of tim-lists@xxxxxxxxxxx> wrote:

    I'm using Ceph on Ubuntu 16.04 on Dell R730xd servers. A recent [1]
    update to the PERC firmware disabled the disk write cache by default
    which made a noticable difference to the latency on my disks (spinning
    disks, not SSD) - by as much as a factor of 10.
    
    For reference their change list says:
    
    "Changes default value of drive cache for 6 Gbps SATA drive to disabled.
    This is to align with the industry for SATA drives. This may result in a
    performance degradation especially in non-Raid mode. You must perform an
    AC reboot to see existing configurations change."
    
    It's fairly straightforward to re-enable the cache either in the PERC
    BIOS, or by using hdparm, and doing so returns the latency back to what
    it was before.
    
    Checking the Ceph documentation I can see that older versions [2]
    recommended disabling the write cache for older kernels. But given I'm
    using a newer kernel, and there's no mention of this in the Luminous
    docs, is it safe to assume it's ok to enable the disk write cache now?
    
    If it makes a difference, I'm using a mixture of filestore and bluestore
    OSDs - migration is still ongoing.
    
    Thanks,
    
    Tim.
    
    [1] - https://www.dell.com/support/home/uk/en/ukdhs1/Drivers/DriversDetails?driverId=8WK8N
    [2] - http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/
    
    -- 
    Tim Bishop
    http://www.bishnet.net/tim/
    PGP Key: 0x6C226B37FDF38D55
    
    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    
    

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux