Re: Quota issue

Geoffrey Letessier <geoffrey.letessier@xxxxxxx> · Tue, 9 Jun 2015 12:10:31 +0200

Hi Vijay,
Thanks for having replied.

Unfortunately, i check each bricks on my stockage pool and dont find any backup file.. damage!

Thank you again!
Good luck and see you,
Geoffrey

------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

Le 9 juin 2015 à 10:05, Vijaikumar M <vmallika@xxxxxxxxxx> a écrit :

    On Tuesday 09 June 2015 01:08 PM,
      Geoffrey Letessier wrote:

      Hi,

      Yes of course:

        [root@lucifer ~]# pdsh -w cl-storage[1,3] du -s
          /export/brick_home/brick*/amyloid_team
        cl-storage1: 1608522280 /export/brick_home/brick1/amyloid_team
        cl-storage3: 1619630616 /export/brick_home/brick1/amyloid_team
        cl-storage1: 1614057836 /export/brick_home/brick2/amyloid_team
        cl-storage3: 1602653808 /export/brick_home/brick2/amyloid_team

        The sum is: 6444864540 (around 6.4-6.5TB) while
          the quota list displays 7.7TB.
        So, the mistake is roughly 1.2-1.3TB, in other
          words around 16% -which is too huge, no?

        In addition, since the quota is exceeded, i note a
          lot of files like following:

          [root@lucifer ~]# pdsh -w cl-storage[1,3] "cd
            /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/;
            ls -ail remd_100.sh 2> /dev/null" 2>/dev/null
          cl-storage3: 133325688 ---------T 2 tarus
            amyloid_team 0 16 févr. 10:20 remd_100.sh

        note the ’T’ at the end of perms and the file size
          to 0B.

        And, yesterday, some files were duplicated but not
          anymore...

        The worst is, previously, all these files were OK.
          In other words, exceeding quota made file or content deletions
          or corruptions… What can I do to prevent to situation for the
          futur -because I guess i cannot do something to rollback this
          situation now, right?

      Hi Geoffrey,

    I tried re-creating the problem.

      Here is the behaviour of vi editor.

    When a file is saved in vi editor, it creates a backup file
      under home dir and opens the original file with 'O_TRUNC' flag and
      hence file was truncated.

      Here is the strace of vi editor when it gets 'EDQUOT' error:

    open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 3

    write(3, "line one\nline two\n", 18)    = 18

    fsync(3)                                = 0

    close(3)                                = -1 EDQUOT (Disk
      quota exceeded)

    chmod("hello", 0100644)                 = 0

    open("/root/hello~", O_RDONLY)          = 3

    open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 7

    read(3, "line one\n", 256)              = 9

    write(7, "line one\n", 9)               = 9

    read(3, "", 256)                        = 0

    close(7)                                = -1 EDQUOT (Disk
      quota exceeded)

    close(3)                                = 0

    To re-cover the truncated file, please find if there are
      any backup file 'remd_115.sh~' under '~/' or on the same dir where
      this file exists. If exists you can copy this file.

    Thanks,

    Vijay

        Geoffrey
        ------------------------------------------------------

          Geoffrey
            Letessier

            Responsable informatique & ingénieur système

            UPR 9080 - CNRS - Laboratoire de Biochimie Théorique

            Institut de Biologie Physico-Chimique

            13, rue Pierre et Marie Curie - 75005 Paris

            Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

            Le 9 juin 2015 à 09:01, Vijaikumar M <vmallika@xxxxxxxxxx> a écrit :

                On Monday 08 June 2015
                  07:11 PM, Geoffrey Letessier wrote:

                  In addition, i notice a very big difference between
                  the sum of DU on each brick and « quota list »
                  display, as you can read below:

                    [root@lucifer
                      ~]# pdsh -w cl-storage[1,3] du -sh
                      /export/brick_home/brick*/amyloid_team
                    cl-storage1:
                      1,6T /export/brick_home/brick1/amyloid_team
                    cl-storage3:
                      1,6T /export/brick_home/brick1/amyloid_team
                    cl-storage1:
                      1,6T /export/brick_home/brick2/amyloid_team
                    cl-storage3:
                      1,6T /export/brick_home/brick2/amyloid_team
                    [root@lucifer
                      ~]# gluster volume quota vol_home list
                      /amyloid_team

                                Path                   Hard-limit
                      Soft-limit   Used  Available
                    --------------------------------------------------------------------------------
                    /amyloid_team 
                                                  9.0TB       90%      
                      7.8TB   1.2TB

                    As you can notice, the sum of all
                      bricks gives me roughly 6.4TB and « quota list »
                      around 7.8TB; so there is a difference of 1.4TB
                      i’m not able to explain… Do you have any idea?

                There were few issues when quota accounting
                the size, we have fixed some of these issues in 3.7

                'df -h' will round
                  off the values, can you please provide the output of
                  'df' without -h option?

                    Thanks,
                    Geoffrey

                      ------------------------------------------------------

                        Geoffrey Letessier

                        Responsable informatique & ingénieur système

                        UPR 9080 - CNRS - Laboratoire de
                        Biochimie Théorique

                        Institut de Biologie Physico-Chimique

                        13, rue Pierre et Marie Curie - 75005 Paris

                        Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

                        Le 8 juin 2015 à 14:30, Geoffrey
                          Letessier <geoffrey.letessier@xxxxxxx>
                          a écrit :

                          Hello,

                            Concerning the 3.5.3 version
                              of GlusterFS, I met this morning a strange
                              issue writing file when quota is
                              exceeded. 

                            One person of my lab, whose
                              her quota is exceeded (but she didn’t know
                              about) try to modify a file but, because
                              of exceeded quota, she was unable to and
                              decided to exit VI. Now, her file is
                              empty/blank as you can read below:

                we suspect 'vi' might have created tmp file
                  before writing to a file. We are working on
                  re-creating this problem and will update you on the
                  same.

                              pdsh@lucifer: cl-storage3: ssh
                                exited with exit code 2
                              cl-storage1: ---------T 2 tarus
                                amyloid_team 0 19 févr. 12:34
/export/brick_home/brick1/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh
                              cl-storage1: -rwxrw-r-- 2 tarus
                                amyloid_team 0  8 juin  12:38
/export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh

                              In addition, i dont
                                understand why, my volume being a
                                distributed volume inside replica
                                (cl-storage[1,3] is replicated only on
                                cl-storage[2,4]), i have 2 « same »
                                files (complete path) in 2 different
                                bricks (as you can read above).

                              Thanks by advance for your
                                help and clarification.
                              Geoffrey

                                ------------------------------------------------------

                                  Geoffrey Letessier

                                  Responsable informatique &
                                  ingénieur système

                                  UPR 9080 - CNRS - Laboratoire de
                                  Biochimie Théorique

                                  Institut de Biologie Physico-Chimique

                                  13, rue Pierre et Marie Curie - 75005
                                  Paris

                                  Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

                                  Le 2 juin 2015 à 23:45,
                                    Geoffrey Letessier <geoffrey.letessier@xxxxxxx>
                                    a écrit :

                                    Hi
                                      Ben,

                                      I just check my
                                        messages log files, both on
                                        client and server, and I dont
                                        find any hung task you notice on
                                        yours.. 

                                      As you can read
                                        below, i dont note the
                                        performance issue in a simple DD
                                        but I think my issue is
                                        concerning a set of small files
                                        (tens of thousands nay more)…

                                          [root@nisus
                                            test]# ddt -t 10g /mnt/test/
                                          Writing to
                                            /mnt/test/ddt.8362 ...
                                            syncing ... done.
                                          sleeping 10
                                            seconds ... done.
                                          Reading from
                                            /mnt/test/ddt.8362 ... done.
                                          10240MiB 
                                              KiB/s  CPU%
                                          Write   
                                              114770     4
                                          Read     
                                              40675     4

                                        for info:
                                          /mnt/test concerns the single
                                          v2 GlFS volume

                                          [root@nisus
                                            test]# ddt -t 10g
                                            /mnt/fhgfs/
                                          Writing to
                                            /mnt/fhgfs/ddt.8380 ...
                                            syncing ... done.
                                          sleeping 10
                                            seconds ... done.
                                          Reading from
                                            /mnt/fhgfs/ddt.8380 ...
                                            done.
                                          10240MiB 
                                              KiB/s  CPU%
                                          Write   
                                              102591     1
                                          Read     
                                              98079     2

                                      Do you have a idea
                                        how to tune/optimize performance
                                        settings? and/or TCP settings
                                        (MTU, etc.)?

                                          ---------------------------------------------------------------
                                          |            
                                            |  UNTAR  |   DU   |  FIND  
                                            |   TAR   |   RM   |
                                          ---------------------------------------------------------------
                                          | single     
                                            |  ~3m45s |   ~43s | 
                                              ~47s |  ~3m10s | ~3m15s |
                                          ---------------------------------------------------------------
                                          | replicated 
                                            |  ~5m10s |   ~59s | 
                                             ~1m6s |  ~1m19s | ~1m49s |
                                          ---------------------------------------------------------------
                                          | distributed
                                            |  ~4m18s |   ~41s | 
                                              ~57s |  ~2m24s | ~1m38s |
                                          ---------------------------------------------------------------
                                          | dist-repl  
                                            |  ~8m18s |  ~1m4s |  ~1m11s
                                            |  ~1m24s | ~2m40s |
                                          ---------------------------------------------------------------
                                          | native FS  
                                            |    ~11s |    ~4s |  
                                              ~2s |    ~56s |   ~10s |
                                          ---------------------------------------------------------------
                                          | BeeGFS    
                                             |  ~3m43s |   ~15s |  
                                              ~3s |  ~1m33s |   ~46s |
                                          ---------------------------------------------------------------
                                          | single (v2)
                                            |   ~3m6s |   ~14s |  
                                             ~32s |   ~1m2s |   ~44s |
                                          ---------------------------------------------------------------

                                        for info: 
                                         -BeeGFS

                                          is a distributed FS (4 bricks,
                                          2 bricks per server and 2
                                          servers)
                                         -
                                          single (v2): simple gluster
                                          volume with default settings

                                      I also note I obtain
                                        the same tar/untar performance
                                        issue with FhGFS/BeeGFS but the
                                        rest (DU, FIND, RM) looks like
                                        to be OK.

                                      Thank you very much
                                        for your reply and help.
                                      Geoffrey

                                          -----------------------------------------------

                                            Geoffrey Letessier

                                            Responsable informatique
                                            & ingénieur système

                                            CNRS - UPR 9080 -
                                            Laboratoire de Biochimie
                                            Théorique

                                            Institut de Biologie
                                            Physico-Chimique

                                            13, rue Pierre et Marie
                                            Curie - 75005 Paris

                                            Tel: 01 58 41 50 93 -
                                            eMail: geoffrey.letessier@xxxxxxx

                                          Le 2 juin 2015 à
                                            21:53, Ben Turner <bturner@xxxxxxxxxx>
                                            a écrit :

                                            I am seeing
                                              problems on 3.7 as well.
                                               Can you check
                                              /var/log/messages on both
                                              the clients and servers
                                              for hung tasks like:

                                              Jun  2 15:23:14 gqac006
                                              kernel: "echo 0 >
                                              /proc/sys/kernel/hung_task_timeout_secs"
                                              disables this message.

                                              Jun  2 15:23:14 gqac006
                                              kernel: iozone        D
                                              0000000000000001     0
                                              21999      1 0x00000080

                                              Jun  2 15:23:14 gqac006
                                              kernel: ffff880611321cc8
                                              0000000000000082
                                              ffff880611321c18
                                              ffffffffa027236e

                                              Jun  2 15:23:14 gqac006
                                              kernel: ffff880611321c48
                                              ffffffffa0272c10
                                              ffff88052bd1e040
                                              ffff880611321c78

                                              Jun  2 15:23:14 gqac006
                                              kernel: ffff88052bd1e0f0
                                              ffff88062080c7a0
                                              ffff880625addaf8
                                              ffff880611321fd8

                                              Jun  2 15:23:14 gqac006
                                              kernel: Call Trace:

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffffa027236e>]
                                              ?
                                              rpc_make_runnable+0x7e/0x80
                                              [sunrpc]

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffffa0272c10>]
                                              ? rpc_execute+0x50/0xa0
                                              [sunrpc]

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffff810aaa21>]
                                              ? ktime_get_ts+0xb1/0xf0

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffff811242d0>]
                                              ? sync_page+0x0/0x50

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffff8152a1b3>]
                                              io_schedule+0x73/0xc0

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffff8112430d>]
                                              sync_page+0x3d/0x50

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffff8152ac7f>]
                                              __wait_on_bit+0x5f/0x90

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffff81124543>]
                                              wait_on_page_bit+0x73/0x80

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffff8109eb80>]
                                              ?
                                              wake_bit_function+0x0/0x50

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffff8113a525>]
                                              ?
                                              pagevec_lookup_tag+0x25/0x40

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffff8112496b>]
wait_on_page_writeback_range+0xfb/0x190

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffff81124b38>]
filemap_write_and_wait_range+0x78/0x90

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffff811c07ce>]
                                              vfs_fsync_range+0x7e/0x100

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffff811c08bd>]
                                              vfs_fsync+0x1d/0x20

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffff811c08fe>]
                                              do_fsync+0x3e/0x60

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffff811c0950>]
                                              sys_fsync+0x10/0x20

                                              Jun  2 15:23:14 gqac006
                                              kernel:
                                              [<ffffffff8100b072>]
system_call_fastpath+0x16/0x1b

                                              Do you see a perf problem
                                              with just a simple DD or
                                              do you need a more complex
                                              workload to hit the issue?
                                               I think I saw an issue
                                              with metadata performance
                                              that I am trying to run
                                              down, let me know if you
                                              can see the problem with
                                              simple DD reads / writes
                                              or if we need to do some
                                              sort of dir / metadata
                                              access as well.

                                              -b

                                              ----- Original Message
                                              -----

                                              From: "Geoffrey
                                                Letessier" <geoffrey.letessier@xxxxxxx>

                                                To: "Pranith Kumar
                                                Karampuri" <pkarampu@xxxxxxxxxx>

                                                Cc: gluster-users@xxxxxxxxxxx

                                                Sent: Tuesday, June 2,
                                                2015 8:09:04 AM

                                                Subject: Re:

                                                GlusterFS 3.7 -
                                                slow/poor performances

                                                Hi Pranith,

                                                I’m sorry but I cannot
                                                bring you any comparison
                                                because comparison will
                                                be

                                                distorted by the fact in
                                                my HPC cluster in
                                                production the network
                                                technology

                                                is InfiniBand QDR and my
                                                volumes are quite
                                                different (brick in
                                                RAID6

                                                (12x2TB), 2 bricks per
                                                server and 4 servers
                                                into my pool)

                                                Concerning your demand,
                                                in attachments you can
                                                find all expected
                                                results

                                                hoping it can help you
                                                to solve this serious
                                                performance issue (maybe
                                                I need

                                                play with glusterfs
                                                parameters?).

                                                Thank you very much by
                                                advance,

                                                Geoffrey

------------------------------------------------------

                                                Geoffrey Letessier

                                                Responsable informatique
                                                & ingénieur système

                                                UPR 9080 - CNRS -
                                                Laboratoire de Biochimie
                                                Théorique

                                                Institut de Biologie
                                                Physico-Chimique

                                                13, rue Pierre et Marie
                                                Curie - 75005 Paris

                                                Tel: 01 58 41 50 93 -
                                                eMail: geoffrey.letessier@xxxxxxx

                                                Le 2 juin 2015 à 10:09,
                                                Pranith Kumar Karampuri
                                                < pkarampu@xxxxxxxxxx >
                                                a

                                                écrit :

                                                hi Geoffrey,

                                                Since you are saying it
                                                happens on all types of
                                                volumes, lets do the

                                                following:

                                                1) Create a dist-repl
                                                volume

                                                2) Set the options etc
                                                you need.

                                                3) enable gluster volume
                                                profile using "gluster
                                                volume profile
                                                <volname>

                                                start"

                                                4) run the work load

                                                5) give output of
                                                "gluster volume profile
                                                <volname> info"

                                                Repeat the steps above
                                                on new and old version
                                                you are comparing this
                                                with.

                                                That should give us
                                                insight into what could
                                                be causing the slowness.

                                                Pranith

                                                On 06/02/2015 03:22 AM,
                                                Geoffrey Letessier
                                                wrote:

                                                Dear all,

                                                I have a crash test
                                                cluster where i’ve
                                                tested the new version
                                                of GlusterFS

                                                (v3.7) before upgrading
                                                my HPC cluster in
                                                production.

                                                But… all my tests show
                                                me very very low
                                                performances.

                                                For my benches, as you
                                                can read below, I do
                                                some actions (untar, du,
                                                find,

                                                tar, rm) with linux
                                                kernel sources, dropping
                                                cache, each on
                                                distributed,

                                                replicated,
                                                distributed-replicated,
                                                single (single brick)
                                                volumes and the

                                                native FS of one brick.

                                                # time (echo 3 >
                                                /proc/sys/vm/drop_caches;
                                                tar xJf
                                                ~/linux-4.1-rc5.tar.xz;

                                                sync; echo 3 >
                                                /proc/sys/vm/drop_caches)

                                                # time (echo 3 >
                                                /proc/sys/vm/drop_caches;
                                                du -sh linux-4.1-rc5/;
                                                echo 3 >

/proc/sys/vm/drop_caches)

                                                # time (echo 3 >
                                                /proc/sys/vm/drop_caches;
                                                find linux-4.1-rc5/|wc
                                                -l; echo 3

                                                /proc/sys/vm/drop_caches)

                                                # time (echo 3 >
                                                /proc/sys/vm/drop_caches;
                                                tar czf
                                                linux-4.1-rc5.tgz

                                                linux-4.1-rc5/; echo 3
                                                >
                                                /proc/sys/vm/drop_caches)

                                                # time (echo 3 >
                                                /proc/sys/vm/drop_caches;
                                                rm -rf linux-4.1-rc5.tgz

                                                linux-4.1-rc5/; echo 3
                                                >
                                                /proc/sys/vm/drop_caches)

                                                And here are the process
                                                times:

---------------------------------------------------------------

                                                | | UNTAR | DU | FIND |
                                                TAR | RM |

---------------------------------------------------------------

                                                | single | ~3m45s | ~43s
                                                | ~47s | ~3m10s | ~3m15s
                                                |

---------------------------------------------------------------

                                                | replicated | ~5m10s |
                                                ~59s | ~1m6s | ~1m19s |
                                                ~1m49s |

---------------------------------------------------------------

                                                | distributed | ~4m18s |
                                                ~41s | ~57s | ~2m24s |
                                                ~1m38s |

---------------------------------------------------------------

                                                | dist-repl | ~8m18s |
                                                ~1m4s | ~1m11s | ~1m24s
                                                | ~2m40s |

---------------------------------------------------------------

                                                | native FS | ~11s | ~4s
                                                | ~2s | ~56s | ~10s |

---------------------------------------------------------------

                                                I get the same results,
                                                whether with default
                                                configurations with
                                                custom

                                                configurations.

                                                if I look at the side of
                                                the ifstat command, I
                                                can note my IO write
                                                processes

                                                never exceed 3MBs...

                                                EXT4 native FS seems to
                                                be faster (roughly
                                                15-20% but no more) than
                                                XFS one

                                                My [test] storage
                                                cluster config is
                                                composed by 2 identical
                                                servers (biCPU

                                                Intel Xeon X5355, 8GB of
                                                RAM, 2x2TB HDD (no-RAID)
                                                and Gb ethernet)

                                                My volume settings:

                                                single: 1server 1 brick

                                                replicated: 2 servers 1
                                                brick each

                                                distributed: 2 servers 2
                                                bricks each

                                                dist-repl: 2 bricks in
                                                the same server and
                                                replica 2

                                                All seems to be OK in
                                                gluster status command
                                                line.

                                                Do you have an idea why
                                                I obtain so bad results?

                                                Thanks in advance.

                                                Geoffrey

-----------------------------------------------

                                                Geoffrey Letessier

                                                Responsable informatique
                                                & ingénieur système

                                                CNRS - UPR 9080 -
                                                Laboratoire de Biochimie
                                                Théorique

                                                Institut de Biologie
                                                Physico-Chimique

                                                13, rue Pierre et Marie
                                                Curie - 75005 Paris

                                                Tel: 01 58 41 50 93 -
                                                eMail: geoffrey.letessier@xxxxxxx

_______________________________________________

                                                Gluster-users mailing
                                                list Gluster-users@xxxxxxxxxxx

                                                http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

                                                Gluster-users mailing
                                                list

                                                Gluster-users@xxxxxxxxxxx

                                                http://www.gluster.org/mailman/listinfo/gluster-users

                  _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users