Re: Preconditioning an RBD image

Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> · Sat, 25 Mar 2017 04:21:47 +0000

On Wed, Mar 22, 2017 at 6:05 AM Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> wrote:

    Does iostat (eg.  iostat -xmy 1
      /dev/sd[a-z]) show high util% or await during these problems?

It does, from watching atop. 

      Ceph filestore requires lots of metadata writing (directory
      splitting for example), xattrs, leveldb, etc. which are small sync
      writes that HDDs are bad at (100-300 iops), and SSDs are good at
      (cheapo would be 6k iops, and not so crazy DC/NVMe would be
      20-200k iops and more). So in theory, these things are mitigated
      by using an SSD, like bcache on your osd device. You could also
      try something like that, at least to test.

That explains our previous performance gains with Areca HBAs in NVRAM / supercap backed write cache mode.  We went to SSD journal design to be more resilient to sustained write workloads, but this created more latency on small/random write IO.

      I have tested with bcache in writeback mode and found hugely
      obvious differences seen by iostat, for example here's my before
      and after (heavier load due to converting week 49-50 or so, and
      the highest spikes being the scrub infinite loop bug in 10.2.3): 

http://www.brockmann-consult.de/ganglia/graph.php?cs=10%2F25%2F2016+10%3A27&ce=03%2F09%2F2017+17%3A26&z=xlarge&hreg[]=ceph.*&mreg[]=sd[c-z]_await&glegend=show&aggregate=1&x=100

      But when you share a cache device, you get a single point of
      failure (and bcache, like all software, can be assumed to have
      bugs too). And I recommend vanilla kernel 4.9 or later which has
      many bcache fixes, or Ubuntu's 4.4 kernel which has the specific
      fixes I checked for.

Yep, I am scared of that and therefore would prefer either a vendor based solid state design (e.g. areca), all SSD OSDs whenever these can be affordable, or start experimenting with cache pools. Does not seem like SSDs are getting any cheaper, just new technologies like 3DXP showing up. 

      On 03/21/17 23:22, Alex Gorbachev wrote:

I wanted to share the recent experience, in which a
      few RBD volumes, formatted as XFS and exported via Ubuntu
      NFS-kernel-server performed poorly, even generated an "out of
      space" warnings on a nearly empty filesystem.  I tried a variety
      of hacks and fixes to no effect, until things started magically
      working just after some dd write testing.

      The only explanation I can come up with is that
        preconditioning, or thickening, the images with this
        benchmarking is what caused the improvement.

      Ceph is Hammer 0.94.7 running on Ubuntu 14.04, kernel 4.10 on
        OSD nodes and 4.4 on NFS nodes.

      Regards,
      Alex
      Storcium
      -- 

        --
          Alex Gorbachev
          Storcium

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    -- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@xxxxxxxxxxxxxxxxxxxx
Internet: http://www.brockmann-consult.de
--------------------------------------------

-- 
--Alex Gorbachev
Storcium
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com