It does seem like the entries get cached for a certain period of time. Here is the memory listing for the rbd client server: root@cephmount1:~# free -m total used free shared buffers cached Mem: 11965 11816 149 3 139 10823 -/+ buffers/cache: 853 11112 Swap: 4047 0 4047 I can add more memory to the server if I need to I have 2 or 4 16GB DIMM laying around here someplace. Here are the some of the pagecache sysctl settings: vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 10 vm.dirty_writeback_centisecs = 500 In terms of the number of files: root@cephmount1:/mnt/ceph-block-device-archive/library/E# time ls real 0m8.073s user 0m0.000s sys 0m0.012s root@cephmount1:/mnt/ceph-block-device-archive/library/E# ls |wc 228 510 3413 However looking at some other directories...I see numbers in the range of 500 and 600, etc...so they will vary based on the name of the artist..however if I had to guess we would not use any more than 800 - 1000 in the very heavy directories at this point. Also...one thing I just noticed is that the 'ls |wc' returns right away...even in cases when right after that I do an 'ls -l' and it takes a while. Thanks, Shain Shain Miley | Manager of Systems and Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649 ________________________________________ From: Robert LeBlanc [robert@xxxxxxxxxxxxx] Sent: Tuesday, January 06, 2015 1:57 PM To: Shain Miley Cc: ceph-users@xxxxxxxx Subject: Re: rbd directory listing performance issues I would think that the RBD mounter would cache the directory listing which should always make it fast, unless there is so much memory pressure that it is dropping it frequently. How many entries are in your directory and total on the RBD? ls | wc -l find . | wc -l What does your memory look like? free -h I'm not sure now much help I can be, but if memory pressure is causing buffers to be freed, then it can cause the system to have to go disk to get the directory listing. I'm guessing that if the directory is large enough it could cause the system to have to go back to the RBD many times. Very small I/O on RBD is very expensive compared to big sequential access. On Tue, Jan 6, 2015 at 11:33 AM, Shain Miley <SMiley@xxxxxxx> wrote: > Robert, > > xfs on the rbd image as well: > > /dev/rbd0 on /mnt/ceph-block-device-archive type xfs (rw) > > However looking at the mount options...it does not look like I've enabled anything special in terms of mount options. > > Thanks, > > Shain > > > Shain Miley | Manager of Systems and Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649 > > ________________________________________ > From: Robert LeBlanc [robert@xxxxxxxxxxxxx] > Sent: Tuesday, January 06, 2015 1:27 PM > To: Shain Miley > Cc: ceph-users@xxxxxxxx > Subject: Re: rbd directory listing performance issues > > What fs are you running inside the RBD? > > On Tue, Jan 6, 2015 at 8:29 AM, Shain Miley <SMiley@xxxxxxx> wrote: >> Hello, >> >> We currently have a 12 node (3 monitor+9 OSD) ceph cluster, made up of 107 x >> 4TB drives formatted with xfs. The cluster is running ceph version 0.80.7: >> >> Cluster health: >> cluster 504b5794-34bd-44e7-a8c3-0494cf800c23 >> health HEALTH_WARN crush map has legacy tunables >> monmap e1: 3 mons at >> {hqceph1=10.35.1.201:6789/0,hqceph2=10.35.1.203:6789/0,hqceph3=10.35.1.205:6789/0}, >> election epoch 156, quorum 0,1,2 hqceph1,hqceph2,hqceph3 >> osdmap e19568: 107 osds: 107 up, 107 in >> pgmap v10117422: 2952 pgs, 15 pools, 77202 GB data, 19532 kobjects >> 226 TB used, 161 TB / 388 TB avail >> >> Relevant ceph.conf entries: >> osd_journal_size = 10240 >> filestore_xattr_use_omap = true >> osd_mount_options_xfs = >> "rw,noatime,nodiratime,logbsize=256k,logbufs=8,inode64" >> osd_mkfs_options_xfs = "-f -i size=2048" >> >> >> A while back I created an 80 TB rbd image to be used as an archive >> repository for some of our audio and video files. We are still seeing good >> rados and rbd read and write throughput performance, however we seem to be >> having quite a long delay in response times when we try to list out the >> files in directories with a large number of folders, files, etc. >> >> Subsequent directory listing times seem to run a lot faster (but I am not >> sure for long that is the case before we see another instance of slowness), >> however the initial directory listings can take 20 to 45 seconds. >> >> The rbd kernel client is running on ubuntu 14.04 using kernel version >> '3.18.0-031800-generic'. >> >> Benchmarks: >> >> root@rbdmount1:/mnt/rbd/music_library/D#time ls (file names removed): >> real 0m18.045s >> user 0m0.000s >> sys 0m0.011s >> >> root@rbdmount1:/mnt/rbd# dd bs=1M count=1024 if=/dev/zero of=test >> conv=fdatasync >> 1024+0 records in >> 1024+0 records out >> 1073741824 bytes (1.1 GB) copied, 9.94287 s, 108 MB/s >> >> >> My questions are: >> >> 1) Is there anything inherent in our setup/configuration that would prevent >> us from having fast directory listings on these larger directories (using an >> rbd image of that size for example)? >> >> 2) Have there been any changes made in Giant that would warrant upgrading >> the cluster a a fix to resolve this issue? >> >> Any suggestions would be greatly appreciated. >> >> Thanks, >> >> Shain >> >> >> Shain Miley | Manager of Systems and Infrastructure, Digital Media | >> smiley@xxxxxxx | 202.513.3649 >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com