cephfs fast on a single big file but very slow on may files

Sascha Frey <sf@xxxxxxxxxxx> · Sun, 23 Mar 2014 13:39:34 +0100

Hi list,

I'm new to ceph and so I installed a four node ceph cluster for testing
purposes.

Each node has two 6-core sandy bridge Xeons, 64 GiB of RAM, 6 15k rpm
SAS drives, one SSD drive for journals and 10G ethernet.
We're using Debian GNU/Linux 7.4 (Wheezy) with kernel 3.13 from Debian
backports repository and Ceph 0.72.2-1~bpo70+1.

Every node runs six OSDs (one for every SAS disk). The SSD is partitioned
into six parts for journals.
Monitors are three of the same nodes (no extra hardware for mons and MDS
for testing). First, I used node #4 as a MDS and later I installed
Ceph-MDS on all four nodes with set_max_mds=3.

I did increase pg_num and pgp_num to 1200 each for both data and
metadata pools.

I mounted the cephfs on one node using the kernel client.
Writing to a single big file is fast:

$ dd if=/dev/zero of=bigfile bs=1M count=1M
1048576+0 records in
1048576+0 records out
1099511627776 bytes (1.1 TB) copied, 1240.52 s, 886 MB/s

Reading is less fast:
$ dd if=bigfile of=/dev/null bs=1M
1048576+0 records in
1048576+0 records out
1099511627776 bytes (1.1 TB) copied, 3226.8 s, 341 MB/s
(during reading, the nodes are mostly idle (>90%, 1-1.8% wa))

After this, I tried to copy the linux kernel source tree (source and
dest dirs both on cephfs, 600 MiB, 45k files):

$ time cp -a linux-3.13.6 linux-3.13.6-copy

real    35m34.184s
user    0m1.884s
sys     0m11.372s

That's much too slow.
The same process takes just a few seconds on one desktop class SATA
drive.

I can't see any load or I/O wait on any of the four nodes. I tried
different mount options:

mon1,mon2,mon3:/ on /export type ceph (rw,relatime,name=someuser,secret=<hidden>,nodcache,nofsc)
mon1,mon2,mon3:/ on /export type ceph (rw,relatime,name=someuser,secret=<hidden>,dcache,fsc,wsize=10485760,rsize=10485760)

Output of 'ceph status':
ceph status
    cluster 32ea6593-8cd6-40d6-ac3b-7450f1d92d16
     health HEALTH_OK
     monmap e1: 3 mons at {howard=xxx.yyy.zzz.199:6789/0,leonard=xxx.yyy.zzz.196:6789/0,penny=xxx.yyy.zzz.198:6789/0}, election epoch 32, quorum 0,1,2 howard,leonard,penny
     mdsmap e107: 1/1/1 up {0=penny=up:active}, 3 up:standby
     osdmap e276: 24 osds: 24 up, 24 in
      pgmap v8932: 2464 pgs, 3 pools, 1028 GB data, 514 kobjects
            2061 GB used, 11320 GB / 13382 GB avail
                2464 active+clean
  client io 119 MB/s rd, 509 B/s wr, 43 op/s

I appreciate if someone may help me to find the reason for that
odd behaviour.

Cheers,
Sascha
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com