Slow ceph fs performance

"Bryan K. Wright" <bkw1a@xxxxxxxxxxxxxxxxxxxxxxxx> · Wed, 26 Sep 2012 10:50:04 -0400

Hi folks,

	I'm seeing reasonable performance when I run rados
benchmarks, but really slow I/O when reading or writing 
from a mounted ceph filesystem.  The rados benchmarks
show about 150 MB/s for both read and write, but when I
go to a client machine with a mounted ceph filesystem
and try to rsync a large (60 GB) directory tree onto
the ceph fs, I'm getting rates of only 2-5 MB/s.

	The OSDs and MDSs are all running 64-bit CentOS 6.3
with the stock CentOS 2.6.32 kernel.  The client is also
64-bit CentOS 6.3, but it's running the "elrepo" 3.5.4 kernel.
There are four OSDs, each with a hardware RAID 5 array
and an SSD for the OSD journal.  The primary network
is a gigabit network, and the OSD, MDS and MON 
machines have a dedicated backend gigabit network on a 
second network interface.

	Locally on the OSD, "hdparm -t -T" reports read rates 
of ~350 MB/s, and bonnie++ shows:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
osd-local    23800M  1037  99 316048  92 131023  19  2272  98 312781  21 521.0  24
Latency             13103us     183ms     123ms   15316us     100ms   75899us
Version  1.96       ------Sequential Create------ --------Random Create--------
osd-local           -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 16817  55 +++++ +++ 28786  77 23890  78 +++++ +++ 27128  75
Latency             21549us     105us     134us     902us      12us     104us

	While rsyncing the files, the ceph logs show lots
of warnings of the form:

[WRN] : slow request 91.848407 seconds old, received at 2012-09-26 09:30:52.252449: osd_op(client.5310.1:56400 1000026eda0.00001ec8 [write 2093056~4096] 0.aa047db8 snapc 1=[]) currently waiting for sub ops

	Snooping on traffic with wireshark shows bursts of 
activity separated by long periods (30-60 sec) of idle time.

	My first thought was that I was seeing a kind of 
"bufferbloat". The SSDs are 120 GB, so they could easily contain 
enough data to take a long time to dump.  I changed to using a 
journal file, limited to 1 GB, but I still see the same slow
behavior.

	Any advice about how to go about debugging this would
be appreciated.

					Thanks,
					Bryan

-- 
========================================================================
Bryan Wright              |"If you take cranberries and stew them like 
Physics Department        | applesauce, they taste much more like prunes
University of Virginia    | than rhubarb does."  --  Groucho 
Charlottesville, VA  22901|			
(434) 924-7218            |         bryan@xxxxxxxxxxxx
========================================================================

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html