Re: Extremely slow du

mohammad kashif <kashif.alig@xxxxxxxxx> · Wed, 12 Jul 2017 10:09:21 +0100

Hi Vijay

Thanks, It would be great if someone can go through the configuration options. Is there any reference document where all these options are described in detail?

I was mainly worried about very slow lookup so only did du on a certain file which has a lot of small files (200K). The lookup time improved dramatically. I didn't do any proper benchmarking. 

Gluster 3.8 without any optimization 

time du -ksh binno/
3.7G    binno/

real    117m45.733s
user    0m1.635s
sys     0m6.430s

Gluster 3.11 with optimization

time du -ksh binno/
3.7G    binno/

real    2m5.595s
user    0m0.767s
sys     0m4.437s

I have also enabled profile 

Before update

Fop           Call Count    Avg-Latency    Min-Latency    Max-Latency
---           ----------    -----------    -----------    -----------
STAT                 153       90.72 us        5.00 us      666.00 us
STATFS                 3      677.67 us      620.00 us      709.00 us
OPENDIR              149     1213.81 us      519.00 us    28777.00 us
LOOKUP               552     8493.01 us        3.00 us    79689.00 us
READDIRP            3518     5351.76 us       11.00 us   341877.00 us
FORGET          10050351           0 us           0 us           0 us
RELEASE          9062130           0 us           0 us           0 us
RELEASEDIR          5395           0 us           0 us           0 us
------ ----- ----- ----- ----- ----- ----- -----  ----- ----- ----- -----

After update

Interval 8 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              2  RELEASEDIR
      0.08     118.00 us     113.00 us     123.00 us              2      STATFS
      0.13     190.00 us     189.00 us     191.00 us              2      LOOKUP
      0.29     422.00 us     422.00 us     422.00 us              2     OPENDIR
     99.49   28539.60 us    1698.00 us   48655.00 us             10    READDIRP
      0.00       0.00 us       0.00 us       0.00 us           5217      UPCALL
      0.00       0.00 us       0.00 us       0.00 us           5217   CI_FORGET

    Duration: 22 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

I am not sure about profiling result as I don't understand it correctly.

Thanks

Kashif

On Tue, Jul 11, 2017 at 4:22 PM, Vijay Bellur <vbellur@xxxxxxxxxx> wrote:
Hi Kashif,

Thank you for your feedback! Do you have some data on the nature of performance improvement observed with 3.11 in the new setup?

Adding Raghavendra and Poornima for validation of configuration and help with identifying why certain files disappeared from the mount point after enabling readdir-optimize.

Regards,

Vijay

On 07/11/2017 11:06 AM, mohammad kashif wrote:

Hi Vijay and Experts

I didn't want to experiment with my production setup so started  a

parallel system with two server and around 80TB storage.  First

configured with gluster 3.8 and had the same lookup performance issue.

Then upgraded to 3.11 as you suggested and it made huge improvement in

lookup time. I also did some more optimization as suggested in other

threads.

Now I am going to update my production server. I am planning to use

following  optimization option, it would be very useful if you can point

out any inconsistency or suggest some other options. My production setup

has 5 servers consisting of  400TB storage and around 80 million files

of varying lengths.

Options Reconfigured:

server.event-threads: 4

client.event-threads: 4

cluster.lookup-optimize: on

cluster.readdir-optimize: off

performance.client-io-threads: on

performance.cache-size: 1GB

performance.parallel-readdir: on

performance.md-cache-timeout: 600

performance.cache-invalidation: on

performance.stat-prefetch: on

features.cache-invalidation-timeout: 600

features.cache-invalidation: on

nfs.disable: on

performance.readdir-ahead: on

transport.address-family: inet

auth.allow: 163.1.136.*

diagnostics.latency-measurement: on

diagnostics.count-fop-hits: on

I found that setting cluster.readdir-optimize to 'on' made some files

disappear from client !

Thanks

Kashif

On Sun, Jun 18, 2017 at 4:57 PM, Vijay Bellur <vbellur@xxxxxxxxxx

<mailto:vbellur@xxxxxxxxxx>> wrote:

    Hi Mohammad,

    A lot of time is being spent in addressing metadata calls as

    expected. Can you consider testing out with 3.11 with md-cache [1]

    and readdirp [2] improvements?

    Adding Poornima and Raghavendra who worked on these enhancements to

    help out further.

    Thanks,

    Vijay

    [1] https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/

    <https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/>

    [2] https://github.com/gluster/glusterfs/issues/166

    <https://github.com/gluster/glusterfs/issues/166>

    On Fri, Jun 16, 2017 at 2:49 PM, mohammad kashif

    <kashif.alig@xxxxxxxxx <mailto:kashif.alig@xxxxxxxxx>> wrote:

        Hi Vijay

        Did you manage to look into the gluster profile logs ?

        Thanks

        Kashif

        On Mon, Jun 12, 2017 at 11:40 AM, mohammad kashif

        <kashif.alig@xxxxxxxxx <mailto:kashif.alig@xxxxxxxxx>> wrote:

            Hi Vijay

            I have enabled client profiling and used this script

            https://github.com/bengland2/gluster-profile-analysis/blob/master/gvp-client.sh

            <https://github.com/bengland2/gluster-profile-analysis/blob/master/gvp-client.sh>

            to extract data. I am attaching output files. I don't have

            any reference data to compare with my output. Hopefully you

            can make some sense out of it.

            On Sat, Jun 10, 2017 at 10:47 AM, Vijay Bellur

            <vbellur@xxxxxxxxxx <mailto:vbellur@xxxxxxxxxx>> wrote:

                Would it be possible for you to turn on client profiling

                and then run du? Instructions for turning on client

                profiling can be found at [1]. Providing the client

                profile information can help us figure out where the

                latency could be stemming from.

                Regards,

                Vijay

                [1] https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/#client-side-profiling

                <https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/#client-side-profiling>

                On Fri, Jun 9, 2017 at 7:22 PM, mohammad kashif

                <kashif.alig@xxxxxxxxx <mailto:kashif.alig@xxxxxxxxx>>

                wrote:

                    Hi Vijay

                    Thanks for your quick response. I am using gluster

                    3.8.11 on  Centos 7 servers

                    glusterfs-3.8.11-1.el7.x86_64

                    clients are centos 6 but I tested with a centos 7

                    client as well and results didn't change

                    gluster volume info Volume Name: atlasglust

                    Type: Distribute

                    Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b

                    Status: Started

                    Snapshot Count: 0

                    Number of Bricks: 5

                    Transport-type: tcp

                    Bricks:

                    Brick1: pplxgluster01.x.y.z:/glusteratlas/brick001/gv0

                    Brick2: pplxgluster02..x.y.z:/glusteratlas/brick002/gv0

                    Brick3: pplxgluster03.x.y.z:/glusteratlas/brick003/gv0

                    Brick4: pplxgluster04.x.y.z:/glusteratlas/brick004/gv0

                    Brick5: pplxgluster05.x.y.z:/glusteratlas/brick005/gv0

                    Options Reconfigured:

                    nfs.disable: on

                    performance.readdir-ahead: on

                    transport.address-family: inet

                    auth.allow: x.y.z

                    I am not using directory quota.

                    Please let me know if you require some more info

                    Thanks

                    Kashif

                    On Fri, Jun 9, 2017 at 2:34 PM, Vijay Bellur

                    <vbellur@xxxxxxxxxx <mailto:vbellur@xxxxxxxxxx>> wrote:

                        Can you please provide more details about your

                        volume configuration and the version of gluster

                        that you are using?

                        Regards,

                        Vijay

                        On Fri, Jun 9, 2017 at 5:35 PM, mohammad kashif

                        <kashif.alig@xxxxxxxxx

                        <mailto:kashif.alig@xxxxxxxxx>> wrote:

                            Hi

                            I have just moved our 400 TB HPC storage

                            from lustre to gluster. It is part of a

                            research institute and users have very small

                            files to  big files ( few KB to 20GB) . Our

                            setup consists of 5 servers, each with 96TB

                            RAID 6 disks. All servers are connected

                            through 10G ethernet but not all clients.

                            Gluster volumes are distributed without any

                            replication. There are approximately 80

                            million files in file system.

                            I am mounting using glusterfs on  clients.

                            I have copied everything from lustre to

                            gluster but old file system exist so I can

                            compare.

                            The problem, I am facing is extremely slow

                            du on even a small directory. Also the time

                            taken is substantially different each time.

                            I tried du from same client on  a particular

                            directory twice and got these results.

                            time du -sh /data/aa/bb/cc

                            3.7G /data/aa/bb/cc

                            real 7m29.243s

                            user 0m1.448s

                            sys 0m7.067s

                            time du -sh /data/aa/bb/cc

                            3.7G      /data/aa/bb/cc

                            real 16m43.735s

                            user 0m1.097s

                            sys 0m5.802s

                            16m and 7m is too long for a 3.7 G

                            directory. I must mention that the directory

                            contains huge number of files (208736)

                            but running du on same directory on old data

                            gives this result

                            time du -sh /olddata/aa/bb/cc

                            4.0G /olddata/aa/bb/cc

                            real 3m1.255s

                            user 0m0.755s

                            sys 0m38.099s

                            much better if I run same command again

                            time du -sh /olddata/aa/bb/cc

                            4.0G /olddata/aa/bb/cc

                            real 0m8.309s

                            user 0m0.313s

                            sys 0m7.755s

                            Is there anything I can do to improve this

                            performance? I would also like hear from

                            some one who is running same kind of setup.

                            Thanks

                            Kashif

                            _______________________________________________

                            Gluster-users mailing list

                            Gluster-users@xxxxxxxxxxx

                            <mailto:Gluster-users@gluster.org>

                            http://lists.gluster.org/mailman/listinfo/gluster-users

                            <http://lists.gluster.org/mailman/listinfo/gluster-users>

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users