Re: Gluster 3.12.14: wrong quota in Distributed Dispersed Volume

Hari Gowtham <hgowtham@xxxxxxxxxx> · Mon, 26 Nov 2018 23:47:07 +0530

As the crawl is shutdown, a kill should have been issued.
Its not an OOM kill as you didn't see the message in /var/log/messages
So it should either be killed by you through some means like a node restart,
volume stop, or an explicit kill.

To fix this, we need to restart quota.
On Mon, Nov 26, 2018 at 10:07 PM Gudrun Mareike Amedick
<g.amedick@xxxxxxxxxxxxxx> wrote:
>
> Hi Hari,
>
> I think I have indeed found a hint at to where the error is. As in, the script gives me an error. This is what happens:
>
> # python /root/glusterscripts/quotas/quota_fsch.py  --sub-dir $broken_dir $brick
> getfattr: Removing leading '/' from absolute path names
> getfattr: Removing leading '/' from absolute path names
> getfattr: Removing leading '/' from absolute path names
> getfattr: Removing leading '/' from absolute path names
> getfattr: Removing leading '/' from absolute path names
> getfattr: Removing leading '/' from absolute path names
> getfattr: Removing leading '/' from absolute path names
> mismatch
> Traceback (most recent call last):
>   File "/root/glusterscripts/quotas/quota_fsch.py", line 371, in <module>
>     walktree(os.path.join(brick_path, sub_dir), hard_link_dict)
>   File "/root/glusterscripts/quotas/quota_fsch.py", line 286, in walktree
>     subtree_size = walktree(pathname, descendent_hardlinks)
>   File "/root/glusterscripts/quotas/quota_fsch.py", line 325, in walktree
>     verify_dir_xattr(t_dir, aggr_size[t_dir])
>   File "/root/glusterscripts/quotas/quota_fsch.py", line 260, in verify_dir_xattr
>     print_msg(QUOTA_SIZE_MISMATCH, path, xattr_dict, stbuf, dir_size)
>   File "/root/glusterscripts/quotas/quota_fsch.py", line 60, in print_msg
>     print '%24s %60s %12s %12s' % ("Size Mismatch",path , xattr_dict['contri_size'],
> KeyError: 'contri_size'
>
> This looks kind of wrong, so I ran the script with --full-logs. The result is longer and it contains this:
>
> Verbose                  /$somefile_1
> xattr_values: {'parents': {}}
> posix.stat_result(st_mode=33188, st_ino=8120161795, st_dev=65034, st_nlink=2, st_uid=1052, st_gid=1032, st_size=512, st_atime=1538539390,
> st_mtime=1538213613, st_ctime=1538539392)
>
> getfattr: Removing leading '/' from absolute path names
> Verbose                  /somefile_2
> xattr_values: {'parents': {}}
> posix.stat_result(st_mode=33188, st_ino=8263802208, st_dev=65034, st_nlink=2, st_uid=1052, st_gid=1032, st_size=46139430400, st_atime=1542640461,
> st_mtime=1542645844, st_ctime=1542709397)
>
>
> This looks even more wrong, so I took a look at the file attributes:
>
>
> # getfattr -e hex -d -m. --no-dereference /$somefile_1
> getfattr: Removing leading '/' from absolute path names
> # file: $somefile_1
> trusted.ec.config=0x0000080602000200
> trusted.ec.dirty=0x00000000000000000000000000000000
> trusted.ec.size=0x0000002af87f5800
> trusted.ec.version=0x0000000000234ba30000000000234ba3
> trusted.gfid=0x270a5939c1fe40d5aa13d943209eedab
> trusted.gfid2path.7bae7a7a6d9b6e99=0x36666433306232342d396536352d346339322d613030662d3533393662393131343830662f686f6d652d3138313131392e746172
>
> # getfattr -e hex -d -m. --no-dereference /$somefile_2
> getfattr: Removing leading '/' from absolute path names
> # file: $somefile_2
> trusted.ec.config=0x0000080602000200
> trusted.ec.dirty=0x00000000000000000000000000000000
> trusted.ec.size=0x00000000000006eb
> trusted.ec.version=0x00000000000000010000000000000005
> trusted.gfid=0xcfc7641415ae46899b7cb1035491d706
> trusted.gfid2path.13dac9b562af3c0d=0x36666433306232342d396536352d346339322d613030662d3533393662393131343830662f6d6166446966663575362e747874
>
> So, no quota file attributes. This doesn't look good to me..
> I also took a look at the attributes of $broken_dir and I think dirty is already set:
>
> # getfattr -e hex -d -m. --no-dereference $broken_dir
> getfattr: Removing leading '/' from absolute path names
> # file: $broken_dir
> trusted.ec.version=0x00000000000000180000000000000022
> trusted.gfid=0xccc9615e9bc94b5fb27a1db54c66cd3c
> trusted.glusterfs.dht=0x00000001000000006aaaaaa97ffffffd
> trusted.glusterfs.quota.2631bcce-32bd-4e3e-9953-6412063a9fca.contri.3=0x000000000a64200000000000000000140000000000000012
> trusted.glusterfs.quota.dirty=0x3000
> trusted.glusterfs.quota.limit-set.3=0x0000010000000000ffffffffffffffff
> trusted.glusterfs.quota.size.3=0x000000000a64200000000000000000140000000000000012
>
> Does that mean that the crawlers didn't finish their jobs?
>
> Kind regards
>
> GudrunAm Montag, den 26.11.2018, 20:20 +0530 schrieb Hari Gowtham:
> > Comments inline.
> >
> > On Mon, Nov 26, 2018 at 7:25 PM Gudrun Mareike Amedick
> > <g.amedick@xxxxxxxxxxxxxx> wrote:
> > >
> > >
> > > Hi Hari,
> > >
> > > I'm sorry to bother you again, but I have a few questions concerning the script.
> > >
> > > Do I understand correctly that I have to execute it once per brick on each server?
> > Yes. On all the servers.
> > >
> > > It is a dispersed volume, so the file size on brick side and on client side can differ. Is that a problem?
> > The size on brick is aggregated by the quota daemon and the displayed
> > on the client. If there was a problem with the aggregation
> > (caused due to missed updates in the brick) then we see a different
> > size reported. As its a distributed file system this is how it works.
> > To fix these missed updates, we need to find where the updates are
> > missed. and then on these missed directories we need to set a
> > dirty flag (on the directories of all bricks) and then do a stat on
> > this directory from the client. As the client will see the dirty flag
> > in the xattr,
> > it will try to fix the values that are accounted wrong and update the
> > right value.
> > >
> > >
> > > Is it a reasonable way of action if I first run "python quota_fsck.py --subdir $broken_dir $brickpath" to see if it reports something and if yes,
> > > run
> > The script can be run with fix-issues argument in a single go, but we
> > haven't tested the fix issues side intensively.
> > As the above command shows you where the problem is, we can explicitly
> > set the dirty flag and then do a lookup to fix the issue.
> > This will help you understand where the issue is.
> >
> > >
> > > "python quota_fsck.py --subdir $broken_dir --fix-issues $mountpoint $brickpath" to correct them?
> > The fix issue argument actually makes changes to the brick (changes
> > only related to quota. which can be fixed with a restart)
> > But as the restart is not crawling completely, We will come back to
> > the script in case if there is an abnormality seen.
> >
> > So you can run the script in two ways:
> > 1) without --fix-issues and then see where the issue is. And then set
> > the dirty flag on all the bricks of that directory of the brick and
> > the do a stat from the client.
> > 2) with fix-issue this should take care of both the setting dirty flag
> > and then doing a stat on it.
> >
> > You can choose any one of the above. Both has its own benefits:
> > Without fix-issues: you need to do a lot of work, but its scrutinized.
> > So the changes you make to the backend are fine.
> >
> > with fix-issues (you just need to run it once along with this arguement)
> > It changes the xattr values related to quota on the backend and fixes
> > it. Looking at the size of your volume, if some thing goes wrong then
> > we are left with
> > useless quota values. To clean this up is where restart of quota comes
> > into play. And with your volume size, the restart doesn't fix the
> > whole system.
> >
> > So both the ways things are fine.
> >
> > >
> > >
> > > I'd run "du -h $mountpoint/broken_dir" from client side as a lookup. Is that sufficient?
> > yep. Necessary only if you are running the script without the --fix-issues .
> > >
> > >
> > > Will further action be required or should this be enough?
> > >
> > > Kind regards
> > >
> > > Gudrun
> > > Am Montag, den 26.11.2018, 17:26 +0530 schrieb Hari Gowtham:
> > > >
> > > > Yes. In that case you can run the script and see what errors it is
> > > > throwing and then clean that directory up with setting dirty and then
> > > > doing a lookup.
> > > > Again for such a huge size, it will consume a lot of resource.
> > > >
> > > > On Mon, Nov 26, 2018 at 3:56 PM Gudrun Mareike Amedick
> > > > <g.amedick@xxxxxxxxxxxxxx> wrote:
> > > > >
> > > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > we have no notifications of OOM kills in /var/log/messages. So if I understood this correctly, the crawls finished but my attributes weren't
> > > > > set
> > > > > correctly? And this script should fix them?
> > > > >
> > > > > Thanks for your help so far
> > > > >
> > > > > Gudrun
> > > > > Am Donnerstag, den 22.11.2018, 13:03 +0530 schrieb Hari Gowtham:
> > > > > >
> > > > > >
> > > > > > On Wed, Nov 21, 2018 at 8:55 PM Gudrun Mareike Amedick
> > > > > > <g.amedick@xxxxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hi Hari,
> > > > > > >
> > > > > > > I disabled and re-enabled the quota and I saw the crawlers starting. However, this caused a pretty high load on my servers (200+) and this
> > > > > > > seem to
> > > > > > > have gotten them killed again. At least, I have no crawlers running, the quotas are not matching the output of du -h, and the crawler logs
> > > > > > > all
> > > > > > > contain
> > > > > > > this line:
> > > > > > The quota crawl is an intensive process as it has to crawl the entire
> > > > > > file system. The intensity varies based on the number of bricks,
> > > > > > number of files,
> > > > > > the depth of filesystem, on going io to the filesystem and so on.
> > > > > > Being a disperse volume it will have to talk to all the bricks and
> > > > > > also with the huge size, the
> > > > > > increase in the CPU is expected.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > [2018-11-20 14:16:35.180467] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7f0e3d6fe494] --
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > /usr/sbin/glusterfs(glusterfs_sigwaiter+0xf5) [0x561eb7952d45] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x561eb7952ba4] ) 0-:
> > > > > > > > received
> > > > > > > > signum
> > > > > > > (15), shutting down
> > > > > > This can mean that the file attributes are set and then its stopped/
> > > > > > as you said the process was killed while it still has the attributes
> > > > > > to be set on a few set of files.
> > > > > >
> > > > > > This message is common for all the shutdown (one triggered after the
> > > > > > job is finished and one triggered to stop the process as well)
> > > > > > Can you check the /var/log/messages file for "OOM" kill?
> > > > > > If you see those messages then the shutdown is because of the increase
> > > > > > in memory consumption which is expected.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > I suspect this means my file attributes are not set correctly. Would the script you sent me fix that? And the script seems to be part of
> > > > > > > the
> > > > > > > Git
> > > > > > > GlusterFS 5.0 repo. We are running 3.12. Would it still work on 3.12 (or 4.1, since we'll be upgrading soon) or could it break things?
> > > > > > Quota is not actively developed because of its performance issues
> > > > > > which need a major redesign. So the script holds true for newer
> > > > > > version as well,
> > > > > > because no changes have gone in the code for it.
> > > > > > The advantage of the script is it can be used to run over a certain
> > > > > > directory (need not be root. this reduce the number of directories/
> > > > > > files depth and so on) which is faulty.
> > > > > > The crawl is necessary for the quota to work fine. The script can help
> > > > > > only if the xattrs are set by the crawl. which I think isn't the case
> > > > > > here.
> > > > > > (To verify if the xattrs are set on all the directories we need to do
> > > > > > a getxattr and see) So we can't use script.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Kind regards
> > > > > > >
> > > > > > > Gudrun Amedick
> > > > > > > Am Dienstag, den 20.11.2018, 16:59 +0530 schrieb Hari Gowtham:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > reply inline.
> > > > > > > > On Tue, Nov 20, 2018 at 3:53 PM Gudrun Mareike Amedick
> > > > > > > > <g.amedick@xxxxxxxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I think I know what happened. According to the logs, the crawlers recieved a signum(15). They seemed to have died before having
> > > > > > > > > finished.
> > > > > > > > > Probably
> > > > > > > > > too
> > > > > > > > > much to do simultaneously. I have disabled and re-enabled quota and will set the quotas again with more time.
> > > > > > > > >
> > > > > > > > > Is there a way to restart a crawler that was killed too soon?
> > > > > > > > No. the disable and enable of quota starts a new crawl.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > If I restart a server while a crawler is running, will the crawler be restarted, too? We'll need to do some hardware fixing on one of
> > > > > > > > > the
> > > > > > > > > servers
> > > > > > > > > soon
> > > > > > > > > and I need to know whether I have to check the crawlers first before shutting it down.
> > > > > > > > During the shutdown of the server the crawl will be killed. (data
> > > > > > > > usage shown will be updated as per what has been crawled)
> > > > > > > > The crawl won't be restarted on starting the server. Only quotad will
> > > > > > > > be restarted (which is not the same as crawl).
> > > > > > > > For the crawl to happen you will have to restart the quota.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks for the pointers
> > > > > > > > >
> > > > > > > > > Gudrun Amedick
> > > > > > > > > Am Dienstag, den 20.11.2018, 11:38 +0530 schrieb Hari Gowtham:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > Can you check if the quota crawl finished? Without it having finished
> > > > > > > > > > the quota list will show incorrect values.
> > > > > > > > > > Looking at the under accounting, it looks like the crawl is not yet
> > > > > > > > > > finished ( it does take a lot of time as it has to crawl the whole
> > > > > > > > > > filesystem).
> > > > > > > > > >
> > > > > > > > > > If the crawl has finished and the usage is still showing wrong values
> > > > > > > > > > then there should be an accounting issue.
> > > > > > > > > > The easy way to fix this is to try restarting quota. This will not
> > > > > > > > > > cause any problems. The only downside is the limits won't hold true
> > > > > > > > > > while the quota is disabled,
> > > > > > > > > > till its enabled and the crawl finishes.
> > > > > > > > > > Or you can try using the quota fsck script
> > > > > > > > > > https://review.gluster.org/#/c/glusterfs/+/19179/ to fix your
> > > > > > > > > > accounting issue.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Hari.
> > > > > > > > > > On Mon, Nov 19, 2018 at 10:05 PM Frank Ruehlemann
> > > > > > > > > > <f.ruehlemann@xxxxxxxxxxxxxx> wrote:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > we're running a Distributed Dispersed volume with Gluster 3.12.14 at
> > > > > > > > > > > Debian 9.6 (Stretch).
> > > > > > > > > > >
> > > > > > > > > > > We migrated our data (>300TB) from a pure Distributed volume into this
> > > > > > > > > > > Dispersed volume with cp, followed by multiple rsyncs.
> > > > > > > > > > > After the migration was successful we enabled quotas again with "gluster
> > > > > > > > > > > volume quota $VOLUME enable", which finished successfully.
> > > > > > > > > > > And we set our required quotas with "gluster volume quota $VOLUME
> > > > > > > > > > > limit-usage $PATH $QUOTA", which finished without errors too.
> > > > > > > > > > >
> > > > > > > > > > > But our "gluster volume quota $VOLUME list" shows wrong values.
> > > > > > > > > > > For example:
> > > > > > > > > > > A directory with ~170TB of data shows only 40.8TB Used.
> > > > > > > > > > > When we sum up all quoted directories we're way under the ~310TB that
> > > > > > > > > > > "df -h /$volume" shows.
> > > > > > > > > > > And "df -h /$volume/$directory" shows wrong values for nearly all
> > > > > > > > > > > directories.
> > > > > > > > > > >
> > > > > > > > > > > All 72 8TB-bricks and all quota deamons of the 6 servers are visible and
> > > > > > > > > > > online in "gluster volume status $VOLUME".
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > In quotad.log I found multiple warnings like this:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > [2018-11-16 09:21:25.738901] W [dict.c:636:dict_unref] (-->/usr/lib/x86_64-linux-
> > > > > > > > > > > > gnu/glusterfs/3.12.14/xlator/features/quotad.so(+0x1d58)
> > > > > > > > > > > > [0x7f6844be7d58] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/features/quotad.so(+0x2b92) [0x7f6844be8b92] --
> > > > > > > > > > > > >
> > > > > > > > > > > > > /usr/lib/x86_64-
> > > > > > > > > > > > linux-
> > > > > > > > > > > > gnu/libglusterfs.so.0(dict_unref+0xc0) [0x7f684b0db640] ) 0-dict: dict is NULL [Invalid argument]
> > > > > > > > > > > In some brick logs I found those:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > [2018-11-19 07:23:30.932327] I [MSGID: 120020] [quota.c:2198:quota_unlink_cbk] 0-$VOLUME-quota: quota context not set inode
> > > > > > > > > > > > (gfid:f100f7a9-
> > > > > > > > > > > > 0779-
> > > > > > > > > > > > 4b4c-880f-c8b3b4bdc49d) [Invalid argument]
> > > > > > > > > > > and (replaced the volume name with "$VOLUME") those:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The message "W [MSGID: 120003] [quota.c:821:quota_build_ancestry_cbk] 0-$VOLUME-quota: parent is NULL [Invalid argument]"
> > > > > > > > > > > > repeated
> > > > > > > > > > > > 13
> > > > > > > > > > > > times
> > > > > > > > > > > > between [2018-11-19 15:28:54.089404] and [2018-11-19 15:30:12.792175]
> > > > > > > > > > > > [2018-11-19 15:31:34.559348] W [MSGID: 120003] [quota.c:821:quota_build_ancestry_cbk] 0-$VOLUME-quota: parent is NULL [Invalid
> > > > > > > > > > > > argument]
> > > > > > > > > > > I already found that setting the flag "trusted.glusterfs.quota.dirty" might help, but I'm unsure about the consequences that will
> > > > > > > > > > > be
> > > > > > > > > > > triggered.
> > > > > > > > > > > And I'm unsure about the necessary version flag.
> > > > > > > > > > >
> > > > > > > > > > > Has anyone an idea how to fix this?
> > > > > > > > > > >
> > > > > > > > > > > Best Regards,
> > > > > > > > > > > --
> > > > > > > > > > > Frank Rühlemann
> > > > > > > > > > >    IT-Systemtechnik
> > > > > > > > > > >
> > > > > > > > > > > UNIVERSITÄT ZU LÜBECK
> > > > > > > > > > >     IT-Service-Center
> > > > > > > > > > >
> > > > > > > > > > >     Ratzeburger Allee 160
> > > > > > > > > > >     23562 Lübeck
> > > > > > > > > > >     Tel +49 451 3101 2034
> > > > > > > > > > >     Fax +49 451 3101 2004
> > > > > > > > > > >     ruehlemann@xxxxxxxxxxxxxxxxxxx
> > > > > > > > > > >     www.itsc.uni-luebeck.de
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > _______________________________________________
> > > > > > > > > > > Gluster-users mailing list
> > > > > > > > > > > Gluster-users@xxxxxxxxxxx
> > > > > > > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > > > > > --
> > > > > > Regards,
> > > > > > Hari Gowtham.
> > > >
> >
> >

-- 
Regards,
Hari Gowtham.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users