Write throughput drops to zero

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I recently got my first Ceph cluster up and running and have been doing some stress tests. I quickly found that during sequential write benchmarks the throughput would often drop to zero. Initially I saw this inside QEMU virtual machines, but I can also reproduce the issue with "rados bench" within 5-10 minutes of sustained writes.  If left alone the writes will eventually start going again, but it takes quite a while (at least a couple minutes). If I stop and restart the benchmark the write throughput will immediately be where it is supposed to be.

I have convinced myself it is not a network hardware issue.  I can load up the network with a bunch of parallel iperf benchmarks and it keeps chugging along happily. When the issue occurs with Ceph I don't see any indications of network issues (e.g. dropped packets).  Adding additional network load during the rados bench (using iperf) doesn't seem to trigger the issue any faster or more often.

I have also convinced myself it isn't an issue with a journal getting full or an OSD being too busy.  The amount of data being written before the problem occurs is much larger than the total journal capacity. Watching the load on the OSD servers with top/iostat I don't seen anything being overloaded, rather I see the load everywhere drop to essentially zero when the writes stall. Before the writes stall the load is well distributed with no visible hot spots. The OSDs and hosts that report slow requests are random, so I don't think it is a failing disk or server.  I don't see anything interesting going on in the logs so far (I am just about to do some tests with Ceph's debug logging cranked up).

The cluster specs are:

OS: Ubuntu 14.04 with 3.16 kernel
Ceph: 9.1.0
OSD Filesystem: XFS
Replication: 3X
Two racks with IPoIB network
10Gbps Ethernet between racks
8 OSD servers with:
  * Dual Xeon E5-2630L (12 cores @ 2.4GHz)
  * 128GB RAM
  * 12 6TB Seagate drives (connected to LSI 2208 chip in JBOD mode)
  * Two 400GB Intel P3600 NVMe drives (OS on RAID1 partition, 6 partitions for OSD journals each)
  * Mellanox ConnectX-3 NIC (for both Infiniband and 10Gbps Ethernet)
3 Mons collocated on OSD servers

Any advice is greatly appreciated. I am planning to try this with Hammer too.

Thanks,
Brendan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux