Re: cephfs fast on a single big file but very slow on may files

Gurvinder Singh <gurvindersinghdahiya@xxxxxxxxx> · Mon, 24 Mar 2014 09:42:04 +0100

On 03/24/2014 08:02 AM, Yan, Zheng wrote:
> On Sun, Mar 23, 2014 at 8:39 PM, Sascha Frey <sf@xxxxxxxxxxx> wrote:
>> Hi list,
>>
>> I'm new to ceph and so I installed a four node ceph cluster for testing
>> purposes.
>>
>> Each node has two 6-core sandy bridge Xeons, 64 GiB of RAM, 6 15k rpm
>> SAS drives, one SSD drive for journals and 10G ethernet.
>> We're using Debian GNU/Linux 7.4 (Wheezy) with kernel 3.13 from Debian
>> backports repository and Ceph 0.72.2-1~bpo70+1.
>>
>> Every node runs six OSDs (one for every SAS disk). The SSD is partitioned
>> into six parts for journals.
>> Monitors are three of the same nodes (no extra hardware for mons and MDS
>> for testing). First, I used node #4 as a MDS and later I installed
>> Ceph-MDS on all four nodes with set_max_mds=3.
>>
>> I did increase pg_num and pgp_num to 1200 each for both data and
>> metadata pools.
>>
>> I mounted the cephfs on one node using the kernel client.
>> Writing to a single big file is fast:
>>
>> $ dd if=/dev/zero of=bigfile bs=1M count=1M
>> 1048576+0 records in
>> 1048576+0 records out
>> 1099511627776 bytes (1.1 TB) copied, 1240.52 s, 886 MB/s
>>
>> Reading is less fast:
>> $ dd if=bigfile of=/dev/null bs=1M
>> 1048576+0 records in
>> 1048576+0 records out
>> 1099511627776 bytes (1.1 TB) copied, 3226.8 s, 341 MB/s
>> (during reading, the nodes are mostly idle (>90%, 1-1.8% wa))
>>
>> After this, I tried to copy the linux kernel source tree (source and
>> dest dirs both on cephfs, 600 MiB, 45k files):
>>
>> $ time cp -a linux-3.13.6 linux-3.13.6-copy
>>
>> real    35m34.184s
>> user    0m1.884s
>> sys     0m11.372s
>>
>> That's much too slow.
>> The same process takes just a few seconds on one desktop class SATA
>> drive.
>>
>> I can't see any load or I/O wait on any of the four nodes. I tried
>> different mount options:
>>
>> mon1,mon2,mon3:/ on /export type ceph (rw,relatime,name=someuser,secret=<hidden>,nodcache,nofsc)
>> mon1,mon2,mon3:/ on /export type ceph (rw,relatime,name=someuser,secret=<hidden>,dcache,fsc,wsize=10485760,rsize=10485760)
>>
>> Output of 'ceph status':
>> ceph status
>>     cluster 32ea6593-8cd6-40d6-ac3b-7450f1d92d16
>>      health HEALTH_OK
>>      monmap e1: 3 mons at {howard=xxx.yyy.zzz.199:6789/0,leonard=xxx.yyy.zzz.196:6789/0,penny=xxx.yyy.zzz.198:6789/0}, election epoch 32, quorum 0,1,2 howard,leonard,penny
>>      mdsmap e107: 1/1/1 up {0=penny=up:active}, 3 up:standby
>>      osdmap e276: 24 osds: 24 up, 24 in
>>       pgmap v8932: 2464 pgs, 3 pools, 1028 GB data, 514 kobjects
>>             2061 GB used, 11320 GB / 13382 GB avail
>>                 2464 active+clean
>>   client io 119 MB/s rd, 509 B/s wr, 43 op/s
>>
>>
>> I appreciate if someone may help me to find the reason for that
>> odd behaviour.
> 
> In your case, copying each file requires sending several requests to
> the MDS/OSD, each request can take several to tens of millisecond.
> That's why only about 20 files were copied per second. One option to
> improve the overall speed is perform a parallel copy. (you can find
> some scripts from google)
I have observed the same behavior in our cluster, but by using GNU
parallel as

/mnt/ceph/linux-3.13.6#time parallel -j10 cp -r {} /mnt/ceph/copy/ ::: *

to copy the source code reduce time to

real    14m22.721s
user    0m1.208s
sys     0m7.200s

Hope it helps.

- Gurvinder

> 
> Regards
> Yan, Zheng
> 
>>
>>
>> Cheers,
>> Sascha
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com