Re: cephfs fast on a single big file but very slow on may files

"Yan, Zheng" <ukernel@xxxxxxxxx> · Mon, 24 Mar 2014 15:02:02 +0800

On Sun, Mar 23, 2014 at 8:39 PM, Sascha Frey <sf@xxxxxxxxxxx> wrote:
> Hi list,
>
> I'm new to ceph and so I installed a four node ceph cluster for testing
> purposes.
>
> Each node has two 6-core sandy bridge Xeons, 64 GiB of RAM, 6 15k rpm
> SAS drives, one SSD drive for journals and 10G ethernet.
> We're using Debian GNU/Linux 7.4 (Wheezy) with kernel 3.13 from Debian
> backports repository and Ceph 0.72.2-1~bpo70+1.
>
> Every node runs six OSDs (one for every SAS disk). The SSD is partitioned
> into six parts for journals.
> Monitors are three of the same nodes (no extra hardware for mons and MDS
> for testing). First, I used node #4 as a MDS and later I installed
> Ceph-MDS on all four nodes with set_max_mds=3.
>
> I did increase pg_num and pgp_num to 1200 each for both data and
> metadata pools.
>
> I mounted the cephfs on one node using the kernel client.
> Writing to a single big file is fast:
>
> $ dd if=/dev/zero of=bigfile bs=1M count=1M
> 1048576+0 records in
> 1048576+0 records out
> 1099511627776 bytes (1.1 TB) copied, 1240.52 s, 886 MB/s
>
> Reading is less fast:
> $ dd if=bigfile of=/dev/null bs=1M
> 1048576+0 records in
> 1048576+0 records out
> 1099511627776 bytes (1.1 TB) copied, 3226.8 s, 341 MB/s
> (during reading, the nodes are mostly idle (>90%, 1-1.8% wa))
>
> After this, I tried to copy the linux kernel source tree (source and
> dest dirs both on cephfs, 600 MiB, 45k files):
>
> $ time cp -a linux-3.13.6 linux-3.13.6-copy
>
> real    35m34.184s
> user    0m1.884s
> sys     0m11.372s
>
> That's much too slow.
> The same process takes just a few seconds on one desktop class SATA
> drive.
>
> I can't see any load or I/O wait on any of the four nodes. I tried
> different mount options:
>
> mon1,mon2,mon3:/ on /export type ceph (rw,relatime,name=someuser,secret=<hidden>,nodcache,nofsc)
> mon1,mon2,mon3:/ on /export type ceph (rw,relatime,name=someuser,secret=<hidden>,dcache,fsc,wsize=10485760,rsize=10485760)
>
> Output of 'ceph status':
> ceph status
>     cluster 32ea6593-8cd6-40d6-ac3b-7450f1d92d16
>      health HEALTH_OK
>      monmap e1: 3 mons at {howard=xxx.yyy.zzz.199:6789/0,leonard=xxx.yyy.zzz.196:6789/0,penny=xxx.yyy.zzz.198:6789/0}, election epoch 32, quorum 0,1,2 howard,leonard,penny
>      mdsmap e107: 1/1/1 up {0=penny=up:active}, 3 up:standby
>      osdmap e276: 24 osds: 24 up, 24 in
>       pgmap v8932: 2464 pgs, 3 pools, 1028 GB data, 514 kobjects
>             2061 GB used, 11320 GB / 13382 GB avail
>                 2464 active+clean
>   client io 119 MB/s rd, 509 B/s wr, 43 op/s
>
>
> I appreciate if someone may help me to find the reason for that
> odd behaviour.

In your case, copying each file requires sending several requests to
the MDS/OSD, each request can take several to tens of millisecond.
That's why only about 20 files were copied per second. One option to
improve the overall speed is perform a parallel copy. (you can find
some scripts from google)

Regards
Yan, Zheng

>
>
> Cheers,
> Sascha
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com