Re: rbd directory listing performance issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just to follow up on this thread, the main reason that the rbd directory listing latency was an issue for us,  was that we were seeing a large amount of IO delay in a PHP app that reads from that rbd image.

It occurred to me (based on Roberts cache_dir suggestion below) that maybe doing a recursive find or a recursive directory listing inside the one folder in question might speed things up.

After doing the recursive find...the directory listing seems much faster and the responsiveness of the PHP app has increased as well.

Hopefully nothing else will need to be done here, however it seems that worst case...a daily or weekly cronjob that traverses the directory tree in that folder might be all we need.

Thanks again for all the help.

Shain 



Shain Miley | Manager of Systems and Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649

________________________________________
From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Shain Miley [SMiley@xxxxxxx]
Sent: Tuesday, January 06, 2015 8:16 PM
To: Christian Balzer; ceph-users@xxxxxxxx
Subject: Re:  rbd directory listing performance issues

Christian,

Each of the OSD's server nodes are running on Dell R-720xd's with 64 GB or RAM.

We have 107 OSD's so I have not checked all of them..however the ones I have checked with xfs_db, have shown anywhere from 1% to 4% fragmentation.

I'll try to upgrade the client server to 32 or 64 GB of ram at some point soon...however at this point all the tuning that I have done has not yielded all that much in terms of results.

It maybe a simple fact that I need to look into adding some SSD's, and the overall bottleneck here are the 4TB 7200 rpm disks we are using.

In general, when looking at the graphs in Calamari, we see around 20ms latency (await) for our OSD's however there are lots of times where we see (via the graphs) spikes of 250ms to 400ms as well.

Thanks again,

Shain


Shain Miley | Manager of Systems and Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649

________________________________________
From: Christian Balzer [chibi@xxxxxxx]
Sent: Tuesday, January 06, 2015 7:34 PM
To: ceph-users@xxxxxxxx
Cc: Shain Miley
Subject: Re:  rbd directory listing performance issues

Hello,

On Tue, 6 Jan 2015 15:29:50 +0000 Shain Miley wrote:

> Hello,
>
> We currently have a 12 node (3 monitor+9 OSD) ceph cluster, made up of
> 107 x 4TB drives formatted with xfs. The cluster is running ceph version
> 0.80.7:
>
I assume journals on the same HDD then.

How much memory per node?

[snip]
>
> A while back I created an 80 TB rbd image to be used as an archive
> repository for some of our audio and video files. We are still seeing
> good rados and rbd read and write throughput performance, however we
> seem to be having quite a long delay in response times when we try to
> list out the files in directories with a large number of folders, files,
> etc.
>
> Subsequent directory listing times seem to run a lot faster (but I am
> not sure for long that is the case before we see another instance of
> slowness), however the initial directory listings can take 20 to 45
> seconds.
>

Basically the same thing(s) that Robert said.
How big is "large"?
How much memory on the machine you're mounting this image?
Ah, never mind, just saw your follow-up.

Definitely add memory to this machine if you can.

The initial listing is always going to be slow-ish of sorts depending on
a number of things in the cluster.

As in, how busy is it (IOPS)? With journals on disk your HDDs are going to
be sluggish individually and your directory information might reside
mostly in one object (on one OSD), thus limiting you to the speed of that
particular disk.

And this is also where the memory of your storage nodes comes in, if it is
large enough your "hot" objects will get cached there as well.
To see if that's the case (at least temporarily), drop the caches on all
of your storage nodes (echo 3 > /proc/sys/vm/drop_caches), mount your
image, do the "ls -l" until it's "fast", umount it, mount it again and do
the listing again.
In theory, unless your cluster is extremely busy or your storage node have
very little pagecache, the re-mounted image should get all the info it
needs from said pagecache on your storage nodes, never having to go to the
actual OSD disks and thus be fast(er) than the initial test.

Finally to potentially improve the initial scan that has to come from the
disks obviously, see how fragmented your OSDs are and depending on the
results defrag them.

Christian
--
Christian Balzer        Network/Systems Engineer
chibi@xxxxxxx           Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux