Re: Extremely slow du

Vijay Bellur <vbellur@xxxxxxxxxx> · Tue, 11 Jul 2017 11:22:17 -0400

Hi Kashif,

Thank you for your feedback! Do you have some data on the nature of 
performance improvement observed with 3.11 in the new setup?

Adding Raghavendra and Poornima for validation of configuration and help 
with identifying why certain files disappeared from the mount point 
after enabling readdir-optimize.

Regards,
Vijay

On 07/11/2017 11:06 AM, mohammad kashif wrote:
Hi Vijay and Experts

I didn't want to experiment with my production setup so started  a
parallel system with two server and around 80TB storage.  First
configured with gluster 3.8 and had the same lookup performance issue.
Then upgraded to 3.11 as you suggested and it made huge improvement in
lookup time. I also did some more optimization as suggested in other
threads.
Now I am going to update my production server. I am planning to use
following  optimization option, it would be very useful if you can point
out any inconsistency or suggest some other options. My production setup
has 5 servers consisting of  400TB storage and around 80 million files
of varying lengths.

Options Reconfigured:
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
cluster.readdir-optimize: off
performance.client-io-threads: on
performance.cache-size: 1GB
performance.parallel-readdir: on
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
auth.allow: 163.1.136.*
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on

I found that setting cluster.readdir-optimize to 'on' made some files
disappear from client !

Thanks

Kashif

On Sun, Jun 18, 2017 at 4:57 PM, Vijay Bellur <vbellur@xxxxxxxxxx
<mailto:vbellur@xxxxxxxxxx>> wrote:

    Hi Mohammad,

    A lot of time is being spent in addressing metadata calls as
    expected. Can you consider testing out with 3.11 with md-cache [1]
    and readdirp [2] improvements?

    Adding Poornima and Raghavendra who worked on these enhancements to
    help out further.

    Thanks,
    Vijay

    [1] https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/
    <https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/>

    [2] https://github.com/gluster/glusterfs/issues/166
    <https://github.com/gluster/glusterfs/issues/166>

    On Fri, Jun 16, 2017 at 2:49 PM, mohammad kashif
    <kashif.alig@xxxxxxxxx <mailto:kashif.alig@xxxxxxxxx>> wrote:

        Hi Vijay

        Did you manage to look into the gluster profile logs ?

        Thanks

        Kashif

        On Mon, Jun 12, 2017 at 11:40 AM, mohammad kashif
        <kashif.alig@xxxxxxxxx <mailto:kashif.alig@xxxxxxxxx>> wrote:

            Hi Vijay

            I have enabled client profiling and used this script
            https://github.com/bengland2/gluster-profile-analysis/blob/master/gvp-client.sh
            <https://github.com/bengland2/gluster-profile-analysis/blob/master/gvp-client.sh>
            to extract data. I am attaching output files. I don't have
            any reference data to compare with my output. Hopefully you
            can make some sense out of it.

            On Sat, Jun 10, 2017 at 10:47 AM, Vijay Bellur
            <vbellur@xxxxxxxxxx <mailto:vbellur@xxxxxxxxxx>> wrote:

                Would it be possible for you to turn on client profiling
                and then run du? Instructions for turning on client
                profiling can be found at [1]. Providing the client
                profile information can help us figure out where the
                latency could be stemming from.

                Regards,
                Vijay

                [1] https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/#client-side-profiling
                <https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/#client-side-profiling>

                On Fri, Jun 9, 2017 at 7:22 PM, mohammad kashif
                <kashif.alig@xxxxxxxxx <mailto:kashif.alig@xxxxxxxxx>>
                wrote:

                    Hi Vijay

                    Thanks for your quick response. I am using gluster
                    3.8.11 on  Centos 7 servers
                    glusterfs-3.8.11-1.el7.x86_64

                    clients are centos 6 but I tested with a centos 7
                    client as well and results didn't change

                    gluster volume info Volume Name: atlasglust
                    Type: Distribute
                    Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b
                    Status: Started
                    Snapshot Count: 0
                    Number of Bricks: 5
                    Transport-type: tcp
                    Bricks:
                    Brick1: pplxgluster01.x.y.z:/glusteratlas/brick001/gv0
                    Brick2: pplxgluster02..x.y.z:/glusteratlas/brick002/gv0
                    Brick3: pplxgluster03.x.y.z:/glusteratlas/brick003/gv0
                    Brick4: pplxgluster04.x.y.z:/glusteratlas/brick004/gv0
                    Brick5: pplxgluster05.x.y.z:/glusteratlas/brick005/gv0
                    Options Reconfigured:
                    nfs.disable: on
                    performance.readdir-ahead: on
                    transport.address-family: inet
                    auth.allow: x.y.z

                    I am not using directory quota.

                    Please let me know if you require some more info

                    Thanks

                    Kashif

                    On Fri, Jun 9, 2017 at 2:34 PM, Vijay Bellur
                    <vbellur@xxxxxxxxxx <mailto:vbellur@xxxxxxxxxx>> wrote:

                        Can you please provide more details about your
                        volume configuration and the version of gluster
                        that you are using?

                        Regards,
                        Vijay

                        On Fri, Jun 9, 2017 at 5:35 PM, mohammad kashif
                        <kashif.alig@xxxxxxxxx
                        <mailto:kashif.alig@xxxxxxxxx>> wrote:

                            Hi

                            I have just moved our 400 TB HPC storage
                            from lustre to gluster. It is part of a
                            research institute and users have very small
                            files to  big files ( few KB to 20GB) . Our
                            setup consists of 5 servers, each with 96TB
                            RAID 6 disks. All servers are connected
                            through 10G ethernet but not all clients.
                            Gluster volumes are distributed without any
                            replication. There are approximately 80
                            million files in file system.
                            I am mounting using glusterfs on  clients.

                            I have copied everything from lustre to
                            gluster but old file system exist so I can
                            compare.

                            The problem, I am facing is extremely slow
                            du on even a small directory. Also the time
                            taken is substantially different each time.
                            I tried du from same client on  a particular
                            directory twice and got these results.

                            time du -sh /data/aa/bb/cc
                            3.7G /data/aa/bb/cc
                            real 7m29.243s
                            user 0m1.448s
                            sys 0m7.067s

                            time du -sh /data/aa/bb/cc
                            3.7G      /data/aa/bb/cc
                            real 16m43.735s
                            user 0m1.097s
                            sys 0m5.802s

                            16m and 7m is too long for a 3.7 G
                            directory. I must mention that the directory
                            contains huge number of files (208736)

                            but running du on same directory on old data
                            gives this result

                            time du -sh /olddata/aa/bb/cc
                            4.0G /olddata/aa/bb/cc
                            real 3m1.255s
                            user 0m0.755s
                            sys 0m38.099s

                            much better if I run same command again

                            time du -sh /olddata/aa/bb/cc
                            4.0G /olddata/aa/bb/cc
                            real 0m8.309s
                            user 0m0.313s
                            sys 0m7.755s

                            Is there anything I can do to improve this
                            performance? I would also like hear from
                            some one who is running same kind of setup.

                            Thanks

                            Kashif

                            _______________________________________________
                            Gluster-users mailing list
                            Gluster-users@xxxxxxxxxxx
                            <mailto:Gluster-users@xxxxxxxxxxx>
                            http://lists.gluster.org/mailman/listinfo/gluster-users
                            <http://lists.gluster.org/mailman/listinfo/gluster-users>

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users