backfilling on a single OSD and caching controllers

Lionel Bouton <lionel+ceph@xxxxxxxxxxx> · Wed, 9 Sep 2015 23:24:03 +0200

Hi,

just a tip I just validated on our hardware. I'm currently converting an
OSD from xfs with journal on same platter to btrfs with journal on SSD.
To avoid any unwanted movement, I reused the same OSD number, weight and
placement : so Ceph is simply backfilling all PGs previously stored on
the old version of this OSD.

The problem is that all the other OSDs on the same server (which has a
total of 6) suffer greatly (>10x jump in apply latencies). I
half-expected this: the RAID card has 2GB of battery-backed RAM from
which ~1.6-1.7 GB is used as write cache. Obviously if you write the
entire content of an OSD through this cache (~500GB currently) it will
not be useful: the first GBs will be put in cache but the OSD will
overflow the cache (writing faster than what the HDD can handle) which
will then become useless for the backfilling.
Worse, once the cache is full writes to the other HDDs will compete for
access to the cache with the backfilling OSD instead of getting the full
benefit of a BBWC.

I already took the precaution of excluding the SSDs from the
controller's cache (which already divides the cache pressure by 2
because the writes to journals are not using it). But right now I just
disabled the cache for the HDD behind the OSD on which backfilling is
happening and I saw an immediate performance gain: apply latencies for
the other OSDs on the same server jumped back from >100ms to <10ms.

AFAIK the Ceph OSD code doesn't bypass the kernel cache when
backfilling, if it's really the case this might be a good idea to do so
(or at least make it configurable): the probability that the data
written during backfilling is reused should be lower than the one for
normal accesses.

On an HP Smart Storage Array:

hpacucli> ctrl slot=<n> ld <d> modify caching=disable

when the backfilling stops:

hpacucli> ctrl slot=<n> ld <d> modify caching=enable

This is not usable when there are large scale rebalancing (where nearly
all OSDs are hit by pg movements) but in this particular case this helps
a *lot*.

Best regards,

Lionel
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com