On Tue, Oct 30, 2012 at 9:27 AM, Roman Alekseev <rs.alekseev@xxxxxxxxx> wrote: > On 29.10.2012 22:57, Sam Lang wrote: >> >> >> Hi Roman, >> >> Is this with the ceph fuse client or the ceph kernel module? >> >> Its not surprising that the local file system (/home) is so much faster >> than a mounted ceph volume, especially the first time the directory tree is >> traversed (metadata results are cached at the client to improve >> performance). Try running the same find command on the ceph volume and see >> if the cached results at the client improve performance at all. >> >> In order to understand what the performance of ceph should be capable of >> doing with your deployment for this specific workload, you should run iperf >> between two nodes to get an idea of your latency limits. >> >> Also, I noticed that the real timings you listed for ceph and /home are >> offset by exactly 17 minutes (user and sys are identical). Was that a >> copy/paste error, by chance? >> >> -sam >> >> On 10/29/2012 09:01 AM, Roman Alekseev wrote: >>> >>> Hi, >>> >>> Kindly guide me how to improve performance on the cluster which consist >>> of 5 dedicated servers: >>> >>> - ceph.conf: http://pastebin.com/hT3qEhUF >>> - file system on all drives is ext4 >>> - mount options "user_xattr" >>> - each server has : >>> CPU:Intel® Xeon® Processor E5335(8M Cache, 2.00 GHz, 1333 MHz FSB) x2 >>> MEM: 4Gb DDR2 >>> - 1Gb network >>> >>> Simple test: >>> >>> mounted as ceph >>> root@client1:/mnt/mycephfs# time find . | wc -l >>> 83932 >>> >>> real 17m55.399s >>> user 0m0.152s >>> sys 0m1.528s >>> >>> on 1 HDD: >>> >>> root@client1:/home# time find . | wc -l >>> 83932 >>> >>> real 0m55.399s >>> user 0m0.152s >>> sys 0m1.528s >>> >>> Please help me to find out the issue. Thanks. >>> >> > Hi Sam, > > I use the Ceph fs only as kernel module, because we need to get its > powerful performance but as I can see it is slower then distributed file > system based on fuse, for example, MooseFS performed the same test for 3 > min. > Here is the result iperf test beetwen client and osd server: > root@asrv151:~# iperf -c client -i 1 > ------------------------------------------------------------ > Client connecting to clientIP, TCP port 5001 > TCP window size: 96.1 KByte (default) > ------------------------------------------------------------ > [ 3] local osd_server port 50106 connected with clientIP port 5001 > [ ID] Interval Transfer Bandwidth > [ 3] 0.0- 1.0 sec 112 MBytes 941 Mbits/sec > [ 3] 1.0- 2.0 sec 110 MBytes 924 Mbits/sec > [ 3] 2.0- 3.0 sec 108 MBytes 905 Mbits/sec > [ 3] 3.0- 4.0 sec 109 MBytes 917 Mbits/sec > [ 3] 4.0- 5.0 sec 110 MBytes 926 Mbits/sec > [ 3] 5.0- 6.0 sec 109 MBytes 915 Mbits/sec > [ 3] 6.0- 7.0 sec 110 MBytes 926 Mbits/sec > [ 3] 7.0- 8.0 sec 108 MBytes 908 Mbits/sec > [ 3] 8.0- 9.0 sec 107 MBytes 897 Mbits/sec > [ 3] 9.0-10.0 sec 106 MBytes 886 Mbits/sec > [ 3] 0.0-10.0 sec 1.06 GBytes 914 Mbits/sec > > ceph -w results: > > health HEALTH_OK > monmap e3: 3 mons at {a=mon.a:6789/0,b=mon.b:6789/0,c=mon.c:6789/0}, > election epoch 10, quorum 0,1,2 a,b,c > osdmap e132: 5 osds: 5 up, 5 in > pgmap v11720: 384 pgs: 384 active+clean; 1880 MB data, 10679 MB used, > 5185 GB / 5473 GB avail > mdsmap e4: 1/1/1 up {0=a=up:active} > > 2012-10-30 12:23:09.830677 osd.2 [WRN] slow request 30.135787 seconds old, > received at 2012-10-30 12:22:39.694780: osd_op(mds.0.1:309216 > 10000017163.00000000 [setxattr path (69),setxattr parent (196),tmapput > 0~596] 1.724c80f7) v4 currently waiting for sub ops > 2012-10-30 12:23:10.109637 mon.0 [INF] pgmap v11720: 384 pgs: 384 > active+clean; 1880 MB data, 10679 MB used, 5185 GB / 5473 GB avail > 2012-10-30 12:23:12.918038 mon.0 [INF] pgmap v11721: 384 pgs: 384 > active+clean; 1880 MB data, 10680 MB used, 5185 GB / 5473 GB avail > 2012-10-30 12:23:13.977044 mon.0 [INF] pgmap v11722: 384 pgs: 384 > active+clean; 1880 MB data, 10681 MB used, 5185 GB / 5473 GB avail > 2012-10-30 12:23:10.587391 osd.3 [WRN] 6 slow requests, 6 included below; > oldest blocked for > 30.808352 secs > 2012-10-30 12:23:10.587398 osd.3 [WRN] slow request 30.808352 seconds old, > received at 2012-10-30 12:22:39.778971: osd_op(mds.0.1:308701 200.000002e5 > [write 976010~5402] 1.adbeb1a) v4 currently waiting for sub ops > 2012-10-30 12:23:10.587403 osd.3 [WRN] slow request 30.796417 seconds old, > received at 2012-10-30 12:22:39.790906: osd_op(mds.0.1:308702 200.000002e5 > [write 981412~6019] 1.adbeb1a) v4 currently waiting for sub ops > 2012-10-30 12:23:10.587408 osd.3 [WRN] slow request 30.796347 seconds old, > received at 2012-10-30 12:22:39.790976: osd_op(mds.0.1:308703 200.000002e5 > [write 987431~61892] 1.adbeb1a) v4 currently waiting for sub ops > 2012-10-30 12:23:10.587413 osd.3 [WRN] slow request 30.530228 seconds old, > received at 2012-10-30 12:22:40.057095: osd_op(mds.0.1:308704 200.000002e5 > [write 1049323~6630] 1.adbeb1a) v4 currently waiting for sub ops > 2012-10-30 12:23:10.587417 osd.3 [WRN] slow request 30.530027 seconds old, > received at 2012-10-30 12:22:40.057296: osd_op(mds.0.1:308705 200.000002e5 > [write 1055953~20679] 1.adbeb1a) v4 currently waiting for sub ops > > > At the same time I'm copy data to ceph mounted storage. > > I dunno what can I do to resolve this problem :( > Any advices will be greatly appreciated. Is it the same client copying data into cephfs or a different one? I see here that you have several slow requests; it looks like maybe you're overloading your disks. That could impact metadata lookups if the MDS doesn't have everything cached; have you tried running this test without data ingest? (Obviously we'd like it to be faster even so, but if it's disk contention there's not a lot we can do.) -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html