rbd directory listing performance issues

Shain Miley <SMiley@xxxxxxx> · Tue, 6 Jan 2015 15:29:50 +0000

Hello,

We currently have a 12 node (3 monitor+9 OSD) ceph cluster, made up of 107 x 4TB drives formatted with xfs. The cluster is running ceph version 0.80.7:

Cluster health:

cluster 504b5794-34bd-44e7-a8c3-0494cf800c23

     health HEALTH_WARN crush map has legacy tunables

     monmap e1: 3 mons at {hqceph1=10.35.1.201:6789/0,hqceph2=10.35.1.203:6789/0,hqceph3=10.35.1.205:6789/0}, election epoch 156, quorum 0,1,2 hqceph1,hqceph2,hqceph3

     osdmap e19568: 107 osds: 107 up, 107 in

      pgmap v10117422: 2952 pgs, 15 pools, 77202 GB data, 19532 kobjects

            226 TB used, 161 TB / 388 TB avail

Relevant ceph.conf entries:

osd_journal_size = 10240

filestore_xattr_use_omap = true

osd_mount_options_xfs = "rw,noatime,nodiratime,logbsize=256k,logbufs=8,inode64"

osd_mkfs_options_xfs = "-f -i size=2048"

A while back I created an 80 TB rbd image to be used as an archive repository for some of our audio and video files. We are still seeing good rados and rbd read and write throughput performance, however we seem to be having quite a long delay in response times
 when we try to list out the files in directories with a large number of folders, files, etc.

Subsequent directory listing times seem to run a lot faster (but I am not sure for long that is the case before we see another instance of slowness), however the initial directory listings can take 20 to 45 seconds.

The rbd kernel client is running on ubuntu 14.04 using kernel version '3.18.0-031800-generic'.

Benchmarks:

root@rbdmount1:/mnt/rbd/music_library/D#time ls (file names removed):

real    0m18.045s

user    0m0.000s

sys    0m0.011s

root@rbdmount1:/mnt/rbd# dd bs=1M count=1024 if=/dev/zero of=test conv=fdatasync

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 9.94287 s, 108 MB/s

My questions are:

1) Is there anything inherent in our setup/configuration that would prevent us from having fast directory listings on these larger directories (using an rbd image of that size for example)?

2) Have there been any changes made in Giant that would warrant upgrading the cluster a a fix to resolve this issue?

Any suggestions would be greatly appreciated.

Thanks,

Shain

Shain Miley | Manager of Systems and Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com