Performance optimization tips Gluster 3.3? (small files / directory listings)

luxis2012 at gmail.com (olav johansen) · Fri, 8 Jun 2012 14:23:57 -0400

Hi Brian,

This is a single thread trying to process a sequential task where the
latency really becomes a problem with ls -aR I get similar speed:

[@web1 files]# time ls -aR|wc -l
1968316
real    27m23.432s
user    0m5.523s
sys     0m35.369s
[@web1 files]# time ls -aR|wc -l
1968316
real    26m2.728s
user    0m5.529s
sys     0m33.779s

I understand "ls -alR" isn't truly our use-case but we use similar
functions the application we're supporting uses opendir() / file_exists() a
lot in PHP, ideally we won't have either but that is not the situation I
have, we have been pushing NFS to its limits, we're looking for better /
scalable performance, and looking for feedback / suggestions on this.

Also to rsync the folders to backup servers we hit on the same issue as ls
-alR in terms of speed.  (I understand in this case I could use the raw
/data/ folder)

The difference between a single server -> replicated gluster cluster, what
slowdown do others see compared to a NFS?

Don't get me wrong, Gluster rocks but in our current case latency is
killing us, and I'm looking for help on solving this.

One idea I haven't had a chance to try in terms of latency is to split the
6x1TB raid 10 on each brick to 3x (2x1TB RAID 1)  not sure if gluster can
even do this.  (A1->B1, A2->B2,A3->B3 as one volume)

Any ideas / suggestions are very appreciated.

Thanks again,

On Fri, Jun 8, 2012 at 4:20 AM, Brian Candler <B.Candler at pobox.com> wrote:

> On Fri, Jun 08, 2012 at 12:19:58AM -0400, olav johansen wrote:
> >    # mount -t glusterfs fs1:/data-storage /storage
> >    I've copied over my data to it again and doing a ls several times,
> >    takes ~0.5 seconds:
> >    [@web1 files]# time ls -all|wc -l
>
> Like I said before, please also try without the "-l" flags and compare the
> results.
>
> My guess is that ls -al or ls -alR are not representative of the *real*
> workload you are going to ask of your system (i.e. "scan all the files in
> this directory, sequentially, and perform a stat() call on each one in
> turn") - but please contradict me if I'm wrong.
>
> However you need to measure how much cost that "-l" is giving you.
>
> >    Doing the same thing on the raw os files on one node takes 0.021s
> >    [@fs2 files]# time ls -all|wc -l
> >    1989
> >    real    0m0.021s
> >    user    0m0.007s
> >    sys     0m0.015s
>
> In that case it's probably all coming from cache. If you wanted to test
> actual disk performance then you would do
>
> echo "3" >/proc/sys/vm/drop_caches
>
> before each test (on both client and server, if they are different
> machines).
>
> But from what you say, it sounds like you are actually more interested in
> the cached answers anyway.
>
> >    Just as crazy reference, on another single server with SSD's (Raid 10)
> >    drives I get:
> >    files# time ls -alR|wc -l
> >    2260484
> >    real    0m15.761s
> >    user    0m5.170s
> >    sys     0m7.670s
> >    For the same operation. (this server even have more files...)
>
> You are not comparing like-for-like. A replicated volume behaves very
> differently from a single brick or distributed volume, as explained before.
>
> If you compared a two-brick (HD) setup with an identical two-brick (SSD)
> setup then that would be meaningful.  I would expect that if everything is
> cacheable then you'd get the same results for both.  In that case, what
> you'd show is that the latency for open/stat and heal is the cause of the
> delay.
>
> Like I said before, I expect that adding the "-l" flag to ls is giving you
> lots of cumulative latency.
>
> This means that the server is actually idle for a lot of the time, while
> it's waiting for the next request. So the server has spare capacity for
> handling other clients.
>
> In other words: if your real workload is actually lots of clients accessing
> the system concurrently, you'll get a much better total throughput than the
> simple tests you are doing, which are a single client performing single
> operations one after the other.
>
> >    If I added two more bricks to the cluster / replicated, would this
> >    double read speed?
>
> Definitely not. The latency would be the same, it's just that some requests
> would go to bricks A and B, and other requests would go to bricks C and D.
> The other two bricks would be idle, and would not speed things up.
>
> However, if you had concurrent accesses from multiple clients, the extra
> bricks would give extra capacity so that the total *throughput* would be
> higher when there are multiple clients active.
>
> So I repeat my advice before. If you really want to understand where the
> performance issues are coming from, these two tests may highlight them:
>
> * Compare the same 2-brick replicated volume,
>  using "ls -aR" versus "ls -laR"
>
> * Compare a 2-brick replicated volume to a 2-brick distributed volume,
>  using "ls -laR" on both
>
> Regards,
>
> Brian.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gluster.org/pipermail/gluster-users/attachments/20120608/4d137d02/attachment.htm>