Re: Disabling write cache on SATA HDDs reduces write latency 7 times

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Don’t have any SSD in the cluster to test.

Also without knowing the exact reason why it being enabled has such a negative effect I wouldn’t be sure if also would be the same on SSD’s.

On Sun, 11 Nov 2018 at 6:41 PM, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> wrote:
 

Does it make sense to test disabling this on hdd cluster only?


-----Original Message-----
From: Ashley Merrick [mailto:singapore@xxxxxxxxxxxxxx]
Sent: zondag 11 november 2018 6:24
To: vitalif@xxxxxxxxxx
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: Disabling write cache on SATA HDDs reduces
write latency 7 times

I've just worked out I had the same issue, been trying to work out the
cause for the past few days!

However I am using brand new enterprise Toshiba drivers with 256MB write
cache, was seeing I/O wait peaks of 40% even during a small writing
operation to CEPH and commit / apply latency's in the 40ms+.

Just went through and disabled the write cache on each drive, and done a
few tests with the exact same write performance, but I/O wait in the <1%
and commit / apply latency's in the 1-3ms max.

Something somewhere definitely doesn't seem to like the write cache
being enabled on the disks, this is a EC Pool in the latest Mimic
version.

On Sun, Nov 11, 2018 at 5:34 AM Vitaliy Filippov <vitalif@xxxxxxxxxx>
wrote:


        Hi

        A weird thing happens in my test cluster made from desktop
hardware.

        The command `for i in /dev/sd?; do hdparm -W 0 $i; done` increases 

        single-thread write iops (reduces latency) 7 times!

        It is a 3-node cluster with Ryzen 2700 CPUs, 3x SATA 7200rpm HDDs +
1x 
        SATA desktop SSD for system and ceph-mon + 1x SATA server SSD for 
        block.db/wal in each host. Hosts are linked by 10gbit ethernet (not
the 
        fastest one though, average RTT according to flood-ping is
0.098ms). Ceph 
        and OpenNebula are installed on the same hosts, OSDs are prepared
with 
        ceph-volume and bluestore with default options. SSDs have
capacitors 
        ('power-loss protection'), write cache is turned off for them since
the 
        very beginning (hdparm -W 0 /dev/sdb). They're quite old, but each
of them 
        is capable of delivering ~22000 iops in journal mode (fio -sync=1 
        -direct=1 -iodepth=1 -bs=4k -rw=write).

        However, RBD single-threaded random-write benchmark originally gave
awful 
        results - when testing with `fio -ioengine=libaio -size=10G -sync=1

        -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60 
        -filename=./testfile` from inside a VM, the result was only 58 iops

        average (17ms latency). This was not what I expected from the
HDD+SSD 
        setup.

        But today I tried to play with cache settings for data disks. And I
was 
        really surprised to discover that just disabling HDD write cache
(hdparm 
        -W 0 /dev/sdX for all HDD devices) increases single-threaded
performance 
        ~7 times! The result from the same VM (without even rebooting it)
is 
        iops=405, avg lat=2.47ms. That's a magnitude faster and in fact
2.5ms 
        seems sort of an expected number.

        As I understand 4k writes are always deferred at the default
setting of 
        prefer_deferred_size_hdd=32768, this means they should only get
written to 
        the journal device before OSD acks the write operation.

        So my question is WHY? Why does HDD write cache affect commit
latency with 
        WAL on an SSD?

        I would also appreciate if anybody with similar setup (HDD+SSD with

        desktop SATA controllers or HBA) could test the same thing...

        --
        With best regards,
           Vitaliy Filippov
        _______________________________________________
        ceph-users mailing list
        ceph-users@xxxxxxxxxxxxxx
        http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux