Re: Quota issue

Geoffrey Letessier <geoffrey.letessier@xxxxxxx> · Tue, 9 Jun 2015 09:38:06 +0200

Hi,
Yes of course:
[root@lucifer ~]# pdsh -w cl-storage[1,3] du -s /export/brick_home/brick*/amyloid_team
cl-storage1: 1608522280	/export/brick_home/brick1/amyloid_team
cl-storage3: 1619630616	/export/brick_home/brick1/amyloid_team
cl-storage1: 1614057836	/export/brick_home/brick2/amyloid_team
cl-storage3: 1602653808	/export/brick_home/brick2/amyloid_team

The sum is: 6444864540 (around 6.4-6.5TB) while the quota list displays 7.7TB.
So, the mistake is roughly 1.2-1.3TB, in other words around 16% -which is too huge, no?

In addition, since the quota is exceeded, i note a lot of files like following:
[root@lucifer ~]# pdsh -w cl-storage[1,3] "cd /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/; ls -ail remd_100.sh 2> /dev/null" 2>/dev/null
cl-storage3: 133325688 ---------T 2 tarus amyloid_team 0 16 févr. 10:20 remd_100.sh
note the ’T’ at the end of perms and the file size to 0B.

And, yesterday, some files were duplicated but not anymore...

The worst is, previously, all these files were OK. In other words, exceeding quota made file or content deletions or corruptions… What can I do to prevent to situation for the futur -because I guess i cannot do something to rollback this situation now, right?

Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

Le 9 juin 2015 à 09:01, Vijaikumar M <vmallika@xxxxxxxxxx> a écrit :

    On Monday 08 June 2015 07:11 PM,
      Geoffrey Letessier wrote:

      In addition, i notice a very big difference between the sum of DU
      on each brick and « quota list » display, as you can read below:

        [root@lucifer ~]# pdsh -w cl-storage[1,3] du -sh
          /export/brick_home/brick*/amyloid_team
        cl-storage1: 1,6T /export/brick_home/brick1/amyloid_team
        cl-storage3: 1,6T /export/brick_home/brick1/amyloid_team
        cl-storage1: 1,6T /export/brick_home/brick2/amyloid_team
        cl-storage3: 1,6T /export/brick_home/brick2/amyloid_team
        [root@lucifer ~]# gluster volume quota vol_home list
          /amyloid_team
                          Path                   Hard-limit
          Soft-limit   Used  Available
        --------------------------------------------------------------------------------
        /amyloid_team                              9.0TB    
            90%       7.8TB   1.2TB

        As you can notice, the sum of all bricks gives me
          roughly 6.4TB and « quota list » around 7.8TB; so there is a
          difference of 1.4TB i’m not able to explain… Do you have any
          idea?

    There were few issues when quota accounting the size,
      we have fixed some of these issues in 3.7

    'df -h' will round off the values, can you please
      provide the output of 'df' without -h option?

        Thanks,
        Geoffrey

          ------------------------------------------------------

            Geoffrey Letessier

            Responsable informatique & ingénieur système

            UPR 9080 - CNRS - Laboratoire de Biochimie Théorique

            Institut de Biologie Physico-Chimique

            13, rue Pierre et Marie Curie - 75005 Paris

            Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

            Le 8 juin 2015 à 14:30, Geoffrey Letessier
              <geoffrey.letessier@xxxxxxx>
              a écrit :

              Hello,

                Concerning the 3.5.3 version of GlusterFS,
                  I met this morning a strange issue writing file when
                  quota is exceeded. 

                One person of my lab, whose her quota is
                  exceeded (but she didn’t know about) try to modify a
                  file but, because of exceeded quota, she was unable to
                  and decided to exit VI. Now, her file is empty/blank
                  as you can read below:

    we suspect 'vi' might have created tmp file before writing to a
      file. We are working on re-creating this problem and will update
      you on the same.

                  pdsh@lucifer: cl-storage3:
                    ssh exited with exit code 2
                  cl-storage1: ---------T 2
                    tarus amyloid_team 0 19 févr. 12:34
/export/brick_home/brick1/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh
                  cl-storage1: -rwxrw-r-- 2
                    tarus amyloid_team 0  8 juin  12:38
/export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh

                  In addition, i dont understand why, my
                    volume being a distributed volume inside replica
                    (cl-storage[1,3] is replicated only on
                    cl-storage[2,4]), i have 2 « same » files (complete
                    path) in 2 different bricks (as you can read above).

                  Thanks by advance for your help and
                    clarification.
                  Geoffrey

                    ------------------------------------------------------

                      Geoffrey Letessier

                      Responsable informatique & ingénieur système

                      UPR 9080 - CNRS - Laboratoire de
                      Biochimie Théorique

                      Institut de Biologie Physico-Chimique

                      13, rue Pierre et Marie Curie - 75005 Paris

                      Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

                      Le 2 juin 2015 à 23:45, Geoffrey
                        Letessier <geoffrey.letessier@xxxxxxx> a
                        écrit :

                        Hi Ben,

                          I just check my messages log
                            files, both on client and server, and I dont
                            find any hung task you notice on yours.. 

                          As you can read below, i dont
                            note the performance issue in a simple DD
                            but I think my issue is concerning a set of
                            small files (tens of thousands nay more)…

                              [root@nisus test]# ddt -t 10g
                                /mnt/test/
                              Writing to /mnt/test/ddt.8362
                                ... syncing ... done.
                              sleeping 10 seconds ... done.
                              Reading from /mnt/test/ddt.8362
                                ... done.
                              10240MiB    KiB/s  CPU%
                              Write      114770     4
                              Read        40675     4

                            for info: /mnt/test concerns
                              the single v2 GlFS volume

                              [root@nisus test]# ddt -t 10g
                                /mnt/fhgfs/
                              Writing to /mnt/fhgfs/ddt.8380
                                ... syncing ... done.
                              sleeping 10 seconds ... done.
                              Reading from
                                /mnt/fhgfs/ddt.8380 ... done.
                              10240MiB    KiB/s  CPU%
                              Write      102591     1
                              Read        98079     2

                          Do you have a idea how to
                            tune/optimize performance settings? and/or
                            TCP settings (MTU, etc.)?

                              ---------------------------------------------------------------
                              |             |  UNTAR  |   DU
                                  |  FIND   |   TAR   |   RM   |
                              ---------------------------------------------------------------
                              | single      |  ~3m45s |  
                                ~43s |    ~47s |  ~3m10s | ~3m15s |
                              ---------------------------------------------------------------
                              | replicated  |  ~5m10s |  
                                ~59s |   ~1m6s |  ~1m19s | ~1m49s |
                              ---------------------------------------------------------------
                              | distributed |  ~4m18s |  
                                ~41s |    ~57s |  ~2m24s | ~1m38s |
                              ---------------------------------------------------------------
                              | dist-repl   |  ~8m18s | 
                                ~1m4s |  ~1m11s |  ~1m24s | ~2m40s |
                              ---------------------------------------------------------------
                              | native FS   |    ~11s |   
                                ~4s |     ~2s |    ~56s |   ~10s |
                              ---------------------------------------------------------------
                              | BeeGFS      |  ~3m43s |  
                                ~15s |     ~3s |  ~1m33s |   ~46s |
                              ---------------------------------------------------------------
                              | single (v2) |   ~3m6s |  
                                ~14s |    ~32s |   ~1m2s |   ~44s |
                              ---------------------------------------------------------------

                            for info: 
                             -BeeGFS
                              is a distributed FS (4 bricks, 2 bricks
                              per server and 2 servers)
                             -
                              single (v2): simple gluster volume with
                              default settings

                          I also note I obtain the same
                            tar/untar performance issue with
                            FhGFS/BeeGFS but the rest (DU, FIND, RM)
                            looks like to be OK.

                          Thank you very much for your
                            reply and help.
                          Geoffrey

                              -----------------------------------------------

                                Geoffrey Letessier

                                Responsable informatique & ingénieur
                                système

                                CNRS - UPR 9080 - Laboratoire
                                de Biochimie Théorique

                                Institut de Biologie Physico-Chimique

                                13, rue Pierre et Marie Curie -
                                75005 Paris

                                Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

                              Le 2 juin 2015 à 21:53, Ben
                                Turner <bturner@xxxxxxxxxx> a
                                écrit :

                                I am seeing problems on 3.7
                                  as well.  Can you check
                                  /var/log/messages on both the clients
                                  and servers for hung tasks like:

                                  Jun  2 15:23:14 gqac006 kernel: "echo
                                  0 >
                                  /proc/sys/kernel/hung_task_timeout_secs"
                                  disables this message.

                                  Jun  2 15:23:14 gqac006 kernel: iozone
                                         D 0000000000000001     0 21999
                                       1 0x00000080

                                  Jun  2 15:23:14 gqac006 kernel:
                                  ffff880611321cc8 0000000000000082
                                  ffff880611321c18 ffffffffa027236e

                                  Jun  2 15:23:14 gqac006 kernel:
                                  ffff880611321c48 ffffffffa0272c10
                                  ffff88052bd1e040 ffff880611321c78

                                  Jun  2 15:23:14 gqac006 kernel:
                                  ffff88052bd1e0f0 ffff88062080c7a0
                                  ffff880625addaf8 ffff880611321fd8

                                  Jun  2 15:23:14 gqac006 kernel: Call
                                  Trace:

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffffa027236e>] ?
                                  rpc_make_runnable+0x7e/0x80 [sunrpc]

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffffa0272c10>] ?
                                  rpc_execute+0x50/0xa0 [sunrpc]

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffff810aaa21>] ?
                                  ktime_get_ts+0xb1/0xf0

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffff811242d0>] ?
                                  sync_page+0x0/0x50

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffff8152a1b3>]
                                  io_schedule+0x73/0xc0

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffff8112430d>]
                                  sync_page+0x3d/0x50

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffff8152ac7f>]
                                  __wait_on_bit+0x5f/0x90

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffff81124543>]
                                  wait_on_page_bit+0x73/0x80

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffff8109eb80>] ?
                                  wake_bit_function+0x0/0x50

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffff8113a525>] ?
                                  pagevec_lookup_tag+0x25/0x40

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffff8112496b>]
                                  wait_on_page_writeback_range+0xfb/0x190

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffff81124b38>]
                                  filemap_write_and_wait_range+0x78/0x90

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffff811c07ce>]
                                  vfs_fsync_range+0x7e/0x100

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffff811c08bd>]
                                  vfs_fsync+0x1d/0x20

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffff811c08fe>]
                                  do_fsync+0x3e/0x60

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffff811c0950>]
                                  sys_fsync+0x10/0x20

                                  Jun  2 15:23:14 gqac006 kernel:
                                  [<ffffffff8100b072>]
                                  system_call_fastpath+0x16/0x1b

                                  Do you see a perf problem with just a
                                  simple DD or do you need a more
                                  complex workload to hit the issue?  I
                                  think I saw an issue with metadata
                                  performance that I am trying to run
                                  down, let me know if you can see the
                                  problem with simple DD reads / writes
                                  or if we need to do some sort of dir /
                                  metadata access as well.

                                  -b

                                  ----- Original Message -----

                                  From:
                                    "Geoffrey Letessier" <geoffrey.letessier@xxxxxxx>

                                    To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>

                                    Cc: gluster-users@xxxxxxxxxxx

                                    Sent: Tuesday, June 2, 2015 8:09:04
                                    AM

                                    Subject: Re:
                                    GlusterFS 3.7 - slow/poor
                                    performances

                                    Hi Pranith,

                                    I’m sorry but I cannot bring you any
                                    comparison because comparison will
                                    be

                                    distorted by the fact in my HPC
                                    cluster in production the network
                                    technology

                                    is InfiniBand QDR and my volumes are
                                    quite different (brick in RAID6

                                    (12x2TB), 2 bricks per server and 4
                                    servers into my pool)

                                    Concerning your demand, in
                                    attachments you can find all
                                    expected results

                                    hoping it can help you to solve this
                                    serious performance issue (maybe I
                                    need

                                    play with glusterfs parameters?).

                                    Thank you very much by advance,

                                    Geoffrey

------------------------------------------------------

                                    Geoffrey Letessier

                                    Responsable informatique &
                                    ingénieur système

                                    UPR 9080 - CNRS - Laboratoire de
                                    Biochimie Théorique

                                    Institut de Biologie
                                    Physico-Chimique

                                    13, rue Pierre et Marie Curie -
                                    75005 Paris

                                    Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

                                    Le 2 juin 2015 à 10:09, Pranith
                                    Kumar Karampuri < pkarampu@xxxxxxxxxx
                                    > a

                                    écrit :

                                    hi Geoffrey,

                                    Since you are saying it happens on
                                    all types of volumes, lets do the

                                    following:

                                    1) Create a dist-repl volume

                                    2) Set the options etc you need.

                                    3) enable gluster volume profile
                                    using "gluster volume profile
                                    <volname>

                                    start"

                                    4) run the work load

                                    5) give output of "gluster volume
                                    profile <volname> info"

                                    Repeat the steps above on new and
                                    old version you are comparing this
                                    with.

                                    That should give us insight into
                                    what could be causing the slowness.

                                    Pranith

                                    On 06/02/2015 03:22 AM, Geoffrey
                                    Letessier wrote:

                                    Dear all,

                                    I have a crash test cluster where
                                    i’ve tested the new version of
                                    GlusterFS

                                    (v3.7) before upgrading my HPC
                                    cluster in production.

                                    But… all my tests show me very very
                                    low performances.

                                    For my benches, as you can read
                                    below, I do some actions (untar, du,
                                    find,

                                    tar, rm) with linux kernel sources,
                                    dropping cache, each on distributed,

                                    replicated, distributed-replicated,
                                    single (single brick) volumes and
                                    the

                                    native FS of one brick.

                                    # time (echo 3 >
                                    /proc/sys/vm/drop_caches; tar xJf
                                    ~/linux-4.1-rc5.tar.xz;

                                    sync; echo 3 >
                                    /proc/sys/vm/drop_caches)

                                    # time (echo 3 >
                                    /proc/sys/vm/drop_caches; du -sh
                                    linux-4.1-rc5/; echo 3 >

                                    /proc/sys/vm/drop_caches)

                                    # time (echo 3 >
                                    /proc/sys/vm/drop_caches; find
                                    linux-4.1-rc5/|wc -l; echo 3

                                    /proc/sys/vm/drop_caches)

                                    # time (echo 3 >
                                    /proc/sys/vm/drop_caches; tar czf
                                    linux-4.1-rc5.tgz

                                    linux-4.1-rc5/; echo 3 >
                                    /proc/sys/vm/drop_caches)

                                    # time (echo 3 >
                                    /proc/sys/vm/drop_caches; rm -rf
                                    linux-4.1-rc5.tgz

                                    linux-4.1-rc5/; echo 3 >
                                    /proc/sys/vm/drop_caches)

                                    And here are the process times:

---------------------------------------------------------------

                                    | | UNTAR | DU | FIND | TAR | RM |

---------------------------------------------------------------

                                    | single | ~3m45s | ~43s | ~47s |
                                    ~3m10s | ~3m15s |

---------------------------------------------------------------

                                    | replicated | ~5m10s | ~59s | ~1m6s
                                    | ~1m19s | ~1m49s |

---------------------------------------------------------------

                                    | distributed | ~4m18s | ~41s | ~57s
                                    | ~2m24s | ~1m38s |

---------------------------------------------------------------

                                    | dist-repl | ~8m18s | ~1m4s |
                                    ~1m11s | ~1m24s | ~2m40s |

---------------------------------------------------------------

                                    | native FS | ~11s | ~4s | ~2s |
                                    ~56s | ~10s |

---------------------------------------------------------------

                                    I get the same results, whether with
                                    default configurations with custom

                                    configurations.

                                    if I look at the side of the ifstat
                                    command, I can note my IO write
                                    processes

                                    never exceed 3MBs...

                                    EXT4 native FS seems to be faster
                                    (roughly 15-20% but no more) than
                                    XFS one

                                    My [test] storage cluster config is
                                    composed by 2 identical servers
                                    (biCPU

                                    Intel Xeon X5355, 8GB of RAM, 2x2TB
                                    HDD (no-RAID) and Gb ethernet)

                                    My volume settings:

                                    single: 1server 1 brick

                                    replicated: 2 servers 1 brick each

                                    distributed: 2 servers 2 bricks each

                                    dist-repl: 2 bricks in the same
                                    server and replica 2

                                    All seems to be OK in gluster status
                                    command line.

                                    Do you have an idea why I obtain so
                                    bad results?

                                    Thanks in advance.

                                    Geoffrey

-----------------------------------------------

                                    Geoffrey Letessier

                                    Responsable informatique &
                                    ingénieur système

                                    CNRS - UPR 9080 - Laboratoire de
                                    Biochimie Théorique

                                    Institut de Biologie
                                    Physico-Chimique

                                    13, rue Pierre et Marie Curie -
                                    75005 Paris

                                    Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

_______________________________________________

                                    Gluster-users mailing list Gluster-users@xxxxxxxxxxx

                                    http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

                                    Gluster-users mailing list

                                    Gluster-users@xxxxxxxxxxx

                                    http://www.gluster.org/mailman/listinfo/gluster-users

      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users