Re: Global, Synchronous Blocked Requests

Lionel Bouton <lionel-subscription@xxxxxxxxxxx> · Sat, 28 Nov 2015 11:57:52 +0100

Hi,

Le 28/11/2015 04:24, Brian Felton a écrit :
> Greetings Ceph Community,
>
> We are running a Hammer cluster (0.94.3-1) in production that recently
> experienced asymptotic performance degradation.  We've been migrating
> data from an older non-Ceph cluster at a fairly steady pace for the
> past eight weeks (about 5TB a week).  Overnight, the ingress rate
> dropped by 95%.  Upon investigation, we found we were receiving
> hundreds of thousands of 'slow request' warnings. 
> [...] Each storage server contains 72 6TB SATA drives for Ceph (648
> OSDs, ~3.5PB in total).  Each disk is set up as its own ZFS zpool. 
> Each OSD has a 10GB journal, located within the disk's zpool.

This behavior is similar to what you get with a default BTRFS setup :
performance is good initially and gets worse after some time. As BTRFS
and ZFS are both CoW filesystems, the causes might be the same. In our
case, we had two problems with BTRFS :
- snapshot removal is costly, we use filestore btrfs snap = false,
- fragmentation gets really bad over time even with autodefrag :
  . we created the journals NoCoW to avoid them becoming fragmented and
later moved them to SSD,
  . we developed our own defragmentation scheduler.

Fragmentation was ultimately the biggest cause of performance problem
for us (snapshots only caused manageable spikes of writes).

If you can, I'd advise to do what we initially did : use a mix of
XFS-based OSD (probably the most used case with Ceph) and ZFS-based OSD.
You'll be able to find out if ZFS is slower than XFS in your case by
checking which OSDs are involved in slow requests (you should probably
monitor your commit and apply latencies too).

Best regards,

Lionel
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com