Re: rbd directory listing performance issues

Shain Miley <SMiley@xxxxxxx> · Tue, 6 Jan 2015 19:18:15 +0000

It does seem like the entries get cached for a certain period of time.

Here is the memory listing for the rbd client server:

root@cephmount1:~# free -m
             total       used       free     shared    buffers     cached
Mem:         11965      11816        149          3        139      10823
-/+ buffers/cache:        853      11112
Swap:         4047          0       4047

I can add more memory to the server if I need to I have 2 or 4 16GB DIMM laying around here someplace.

Here are the some of the pagecache sysctl settings:
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 10
vm.dirty_writeback_centisecs = 500

In terms of the number of files:

root@cephmount1:/mnt/ceph-block-device-archive/library/E# time ls
real	0m8.073s
user	0m0.000s
sys	0m0.012s

root@cephmount1:/mnt/ceph-block-device-archive/library/E# ls |wc
    228     510    3413

However looking at some other directories...I see numbers in the range of 500 and 600, etc...so they will vary based on the name of the artist..however if I had to guess we would not use any more than 800 - 1000 in the very heavy directories at this point.

Also...one thing I just noticed is that the 'ls |wc' returns right away...even in cases when right after that I do an 'ls -l' and it takes a while.

Thanks,

Shain

Shain Miley | Manager of Systems and Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649

________________________________________
From: Robert LeBlanc [robert@xxxxxxxxxxxxx]
Sent: Tuesday, January 06, 2015 1:57 PM
To: Shain Miley
Cc: ceph-users@xxxxxxxx
Subject: Re:  rbd directory listing performance issues

I would think that the RBD mounter would cache the directory listing
which should always make it fast, unless there is so much memory
pressure that it is dropping it frequently.

How many entries are in your directory and total on the RBD?
ls | wc -l
find . | wc -l

What does your memory look like?
free -h

I'm not sure now much help I can be, but if memory pressure is causing
buffers to be freed, then it can cause the system to have to go disk
to get the directory listing. I'm guessing that if the directory is
large enough it could cause the system to have to go back to the RBD
many times. Very small I/O on RBD is very expensive compared to big
sequential access.

On Tue, Jan 6, 2015 at 11:33 AM, Shain Miley <SMiley@xxxxxxx> wrote:
> Robert,
>
> xfs on the rbd image as well:
>
> /dev/rbd0 on /mnt/ceph-block-device-archive type xfs (rw)
>
> However looking at the mount options...it does not look like I've enabled anything special in terms of mount options.
>
> Thanks,
>
> Shain
>
>
> Shain Miley | Manager of Systems and Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649
>
> ________________________________________
> From: Robert LeBlanc [robert@xxxxxxxxxxxxx]
> Sent: Tuesday, January 06, 2015 1:27 PM
> To: Shain Miley
> Cc: ceph-users@xxxxxxxx
> Subject: Re:  rbd directory listing performance issues
>
> What fs are you running inside the RBD?
>
> On Tue, Jan 6, 2015 at 8:29 AM, Shain Miley <SMiley@xxxxxxx> wrote:
>> Hello,
>>
>> We currently have a 12 node (3 monitor+9 OSD) ceph cluster, made up of 107 x
>> 4TB drives formatted with xfs. The cluster is running ceph version 0.80.7:
>>
>> Cluster health:
>> cluster 504b5794-34bd-44e7-a8c3-0494cf800c23
>>      health HEALTH_WARN crush map has legacy tunables
>>      monmap e1: 3 mons at
>> {hqceph1=10.35.1.201:6789/0,hqceph2=10.35.1.203:6789/0,hqceph3=10.35.1.205:6789/0},
>> election epoch 156, quorum 0,1,2 hqceph1,hqceph2,hqceph3
>>      osdmap e19568: 107 osds: 107 up, 107 in
>>       pgmap v10117422: 2952 pgs, 15 pools, 77202 GB data, 19532 kobjects
>>             226 TB used, 161 TB / 388 TB avail
>>
>> Relevant ceph.conf entries:
>> osd_journal_size = 10240
>> filestore_xattr_use_omap = true
>> osd_mount_options_xfs =
>> "rw,noatime,nodiratime,logbsize=256k,logbufs=8,inode64"
>> osd_mkfs_options_xfs = "-f -i size=2048"
>>
>>
>> A while back I created an 80 TB rbd image to be used as an archive
>> repository for some of our audio and video files. We are still seeing good
>> rados and rbd read and write throughput performance, however we seem to be
>> having quite a long delay in response times when we try to list out the
>> files in directories with a large number of folders, files, etc.
>>
>> Subsequent directory listing times seem to run a lot faster (but I am not
>> sure for long that is the case before we see another instance of slowness),
>> however the initial directory listings can take 20 to 45 seconds.
>>
>> The rbd kernel client is running on ubuntu 14.04 using kernel version
>> '3.18.0-031800-generic'.
>>
>> Benchmarks:
>>
>> root@rbdmount1:/mnt/rbd/music_library/D#time ls (file names removed):
>> real    0m18.045s
>> user    0m0.000s
>> sys    0m0.011s
>>
>> root@rbdmount1:/mnt/rbd# dd bs=1M count=1024 if=/dev/zero of=test
>> conv=fdatasync
>> 1024+0 records in
>> 1024+0 records out
>> 1073741824 bytes (1.1 GB) copied, 9.94287 s, 108 MB/s
>>
>>
>> My questions are:
>>
>> 1) Is there anything inherent in our setup/configuration that would prevent
>> us from having fast directory listings on these larger directories (using an
>> rbd image of that size for example)?
>>
>> 2) Have there been any changes made in Giant that would warrant upgrading
>> the cluster a a fix to resolve this issue?
>>
>> Any suggestions would be greatly appreciated.
>>
>> Thanks,
>>
>> Shain
>>
>>
>> Shain Miley | Manager of Systems and Infrastructure, Digital Media |
>> smiley@xxxxxxx | 202.513.3649
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com