Re: rbd directory listing performance issues

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Tue, 6 Jan 2015 11:57:40 -0700

I would think that the RBD mounter would cache the directory listing
which should always make it fast, unless there is so much memory
pressure that it is dropping it frequently.

How many entries are in your directory and total on the RBD?
ls | wc -l
find . | wc -l

What does your memory look like?
free -h

I'm not sure now much help I can be, but if memory pressure is causing
buffers to be freed, then it can cause the system to have to go disk
to get the directory listing. I'm guessing that if the directory is
large enough it could cause the system to have to go back to the RBD
many times. Very small I/O on RBD is very expensive compared to big
sequential access.

On Tue, Jan 6, 2015 at 11:33 AM, Shain Miley <SMiley@xxxxxxx> wrote:
> Robert,
>
> xfs on the rbd image as well:
>
> /dev/rbd0 on /mnt/ceph-block-device-archive type xfs (rw)
>
> However looking at the mount options...it does not look like I've enabled anything special in terms of mount options.
>
> Thanks,
>
> Shain
>
>
> Shain Miley | Manager of Systems and Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649
>
> ________________________________________
> From: Robert LeBlanc [robert@xxxxxxxxxxxxx]
> Sent: Tuesday, January 06, 2015 1:27 PM
> To: Shain Miley
> Cc: ceph-users@xxxxxxxx
> Subject: Re:  rbd directory listing performance issues
>
> What fs are you running inside the RBD?
>
> On Tue, Jan 6, 2015 at 8:29 AM, Shain Miley <SMiley@xxxxxxx> wrote:
>> Hello,
>>
>> We currently have a 12 node (3 monitor+9 OSD) ceph cluster, made up of 107 x
>> 4TB drives formatted with xfs. The cluster is running ceph version 0.80.7:
>>
>> Cluster health:
>> cluster 504b5794-34bd-44e7-a8c3-0494cf800c23
>>      health HEALTH_WARN crush map has legacy tunables
>>      monmap e1: 3 mons at
>> {hqceph1=10.35.1.201:6789/0,hqceph2=10.35.1.203:6789/0,hqceph3=10.35.1.205:6789/0},
>> election epoch 156, quorum 0,1,2 hqceph1,hqceph2,hqceph3
>>      osdmap e19568: 107 osds: 107 up, 107 in
>>       pgmap v10117422: 2952 pgs, 15 pools, 77202 GB data, 19532 kobjects
>>             226 TB used, 161 TB / 388 TB avail
>>
>> Relevant ceph.conf entries:
>> osd_journal_size = 10240
>> filestore_xattr_use_omap = true
>> osd_mount_options_xfs =
>> "rw,noatime,nodiratime,logbsize=256k,logbufs=8,inode64"
>> osd_mkfs_options_xfs = "-f -i size=2048"
>>
>>
>> A while back I created an 80 TB rbd image to be used as an archive
>> repository for some of our audio and video files. We are still seeing good
>> rados and rbd read and write throughput performance, however we seem to be
>> having quite a long delay in response times when we try to list out the
>> files in directories with a large number of folders, files, etc.
>>
>> Subsequent directory listing times seem to run a lot faster (but I am not
>> sure for long that is the case before we see another instance of slowness),
>> however the initial directory listings can take 20 to 45 seconds.
>>
>> The rbd kernel client is running on ubuntu 14.04 using kernel version
>> '3.18.0-031800-generic'.
>>
>> Benchmarks:
>>
>> root@rbdmount1:/mnt/rbd/music_library/D#time ls (file names removed):
>> real    0m18.045s
>> user    0m0.000s
>> sys    0m0.011s
>>
>> root@rbdmount1:/mnt/rbd# dd bs=1M count=1024 if=/dev/zero of=test
>> conv=fdatasync
>> 1024+0 records in
>> 1024+0 records out
>> 1073741824 bytes (1.1 GB) copied, 9.94287 s, 108 MB/s
>>
>>
>> My questions are:
>>
>> 1) Is there anything inherent in our setup/configuration that would prevent
>> us from having fast directory listings on these larger directories (using an
>> rbd image of that size for example)?
>>
>> 2) Have there been any changes made in Giant that would warrant upgrading
>> the cluster a a fix to resolve this issue?
>>
>> Any suggestions would be greatly appreciated.
>>
>> Thanks,
>>
>> Shain
>>
>>
>> Shain Miley | Manager of Systems and Infrastructure, Digital Media |
>> smiley@xxxxxxx | 202.513.3649
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com