Hi, we have no notifications of OOM kills in /var/log/messages. So if I understood this correctly, the crawls finished but my attributes weren't set correctly? And this script should fix them? Thanks for your help so far Gudrun Am Donnerstag, den 22.11.2018, 13:03 +0530 schrieb Hari Gowtham: > On Wed, Nov 21, 2018 at 8:55 PM Gudrun Mareike Amedick > <g.amedick@xxxxxxxxxxxxxx> wrote: > > > > > > Hi Hari, > > > > I disabled and re-enabled the quota and I saw the crawlers starting. However, this caused a pretty high load on my servers (200+) and this seem to > > have gotten them killed again. At least, I have no crawlers running, the quotas are not matching the output of du -h, and the crawler logs all > > contain > > this line: > The quota crawl is an intensive process as it has to crawl the entire > file system. The intensity varies based on the number of bricks, > number of files, > the depth of filesystem, on going io to the filesystem and so on. > Being a disperse volume it will have to talk to all the bricks and > also with the huge size, the > increase in the CPU is expected. > > > > > > > [2018-11-20 14:16:35.180467] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7f0e3d6fe494] -- > > > > > > /usr/sbin/glusterfs(glusterfs_sigwaiter+0xf5) [0x561eb7952d45] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x561eb7952ba4] ) 0-: received > > > signum > > (15), shutting down > This can mean that the file attributes are set and then its stopped/ > as you said the process was killed while it still has the attributes > to be set on a few set of files. > > This message is common for all the shutdown (one triggered after the > job is finished and one triggered to stop the process as well) > Can you check the /var/log/messages file for "OOM" kill? > If you see those messages then the shutdown is because of the increase > in memory consumption which is expected. > > > > > > > I suspect this means my file attributes are not set correctly. Would the script you sent me fix that? And the script seems to be part of the Git > > GlusterFS 5.0 repo. We are running 3.12. Would it still work on 3.12 (or 4.1, since we'll be upgrading soon) or could it break things? > Quota is not actively developed because of its performance issues > which need a major redesign. So the script holds true for newer > version as well, > because no changes have gone in the code for it. > The advantage of the script is it can be used to run over a certain > directory (need not be root. this reduce the number of directories/ > files depth and so on) which is faulty. > The crawl is necessary for the quota to work fine. The script can help > only if the xattrs are set by the crawl. which I think isn't the case > here. > (To verify if the xattrs are set on all the directories we need to do > a getxattr and see) So we can't use script. > > > > > > > > Kind regards > > > > Gudrun Amedick > > Am Dienstag, den 20.11.2018, 16:59 +0530 schrieb Hari Gowtham: > > > > > > reply inline. > > > On Tue, Nov 20, 2018 at 3:53 PM Gudrun Mareike Amedick > > > <g.amedick@xxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > Hi, > > > > > > > > I think I know what happened. According to the logs, the crawlers recieved a signum(15). They seemed to have died before having finished. > > > > Probably > > > > too > > > > much to do simultaneously. I have disabled and re-enabled quota and will set the quotas again with more time. > > > > > > > > Is there a way to restart a crawler that was killed too soon? > > > No. the disable and enable of quota starts a new crawl. > > > > > > > > > > > > > > > > > > > If I restart a server while a crawler is running, will the crawler be restarted, too? We'll need to do some hardware fixing on one of the > > > > servers > > > > soon > > > > and I need to know whether I have to check the crawlers first before shutting it down. > > > During the shutdown of the server the crawl will be killed. (data > > > usage shown will be updated as per what has been crawled) > > > The crawl won't be restarted on starting the server. Only quotad will > > > be restarted (which is not the same as crawl). > > > For the crawl to happen you will have to restart the quota. > > > > > > > > > > > > > > > > > > > Thanks for the pointers > > > > > > > > Gudrun Amedick > > > > Am Dienstag, den 20.11.2018, 11:38 +0530 schrieb Hari Gowtham: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > Can you check if the quota crawl finished? Without it having finished > > > > > the quota list will show incorrect values. > > > > > Looking at the under accounting, it looks like the crawl is not yet > > > > > finished ( it does take a lot of time as it has to crawl the whole > > > > > filesystem). > > > > > > > > > > If the crawl has finished and the usage is still showing wrong values > > > > > then there should be an accounting issue. > > > > > The easy way to fix this is to try restarting quota. This will not > > > > > cause any problems. The only downside is the limits won't hold true > > > > > while the quota is disabled, > > > > > till its enabled and the crawl finishes. > > > > > Or you can try using the quota fsck script > > > > > https://review.gluster.org/#/c/glusterfs/+/19179/ to fix your > > > > > accounting issue. > > > > > > > > > > Regards, > > > > > Hari. > > > > > On Mon, Nov 19, 2018 at 10:05 PM Frank Ruehlemann > > > > > <f.ruehlemann@xxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > we're running a Distributed Dispersed volume with Gluster 3.12.14 at > > > > > > Debian 9.6 (Stretch). > > > > > > > > > > > > We migrated our data (>300TB) from a pure Distributed volume into this > > > > > > Dispersed volume with cp, followed by multiple rsyncs. > > > > > > After the migration was successful we enabled quotas again with "gluster > > > > > > volume quota $VOLUME enable", which finished successfully. > > > > > > And we set our required quotas with "gluster volume quota $VOLUME > > > > > > limit-usage $PATH $QUOTA", which finished without errors too. > > > > > > > > > > > > But our "gluster volume quota $VOLUME list" shows wrong values. > > > > > > For example: > > > > > > A directory with ~170TB of data shows only 40.8TB Used. > > > > > > When we sum up all quoted directories we're way under the ~310TB that > > > > > > "df -h /$volume" shows. > > > > > > And "df -h /$volume/$directory" shows wrong values for nearly all > > > > > > directories. > > > > > > > > > > > > All 72 8TB-bricks and all quota deamons of the 6 servers are visible and > > > > > > online in "gluster volume status $VOLUME". > > > > > > > > > > > > > > > > > > In quotad.log I found multiple warnings like this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > [2018-11-16 09:21:25.738901] W [dict.c:636:dict_unref] (-->/usr/lib/x86_64-linux- > > > > > > > gnu/glusterfs/3.12.14/xlator/features/quotad.so(+0x1d58) > > > > > > > [0x7f6844be7d58] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/features/quotad.so(+0x2b92) [0x7f6844be8b92] -->/usr/lib/x86_64- > > > > > > > linux- > > > > > > > gnu/libglusterfs.so.0(dict_unref+0xc0) [0x7f684b0db640] ) 0-dict: dict is NULL [Invalid argument] > > > > > > In some brick logs I found those: > > > > > > > > > > > > > > > > > > > > > > > > > > > > [2018-11-19 07:23:30.932327] I [MSGID: 120020] [quota.c:2198:quota_unlink_cbk] 0-$VOLUME-quota: quota context not set inode > > > > > > > (gfid:f100f7a9- > > > > > > > 0779- > > > > > > > 4b4c-880f-c8b3b4bdc49d) [Invalid argument] > > > > > > and (replaced the volume name with "$VOLUME") those: > > > > > > > > > > > > > > > > > > > > > > > > > > > > The message "W [MSGID: 120003] [quota.c:821:quota_build_ancestry_cbk] 0-$VOLUME-quota: parent is NULL [Invalid argument]" repeated 13 > > > > > > > times > > > > > > > between [2018-11-19 15:28:54.089404] and [2018-11-19 15:30:12.792175] > > > > > > > [2018-11-19 15:31:34.559348] W [MSGID: 120003] [quota.c:821:quota_build_ancestry_cbk] 0-$VOLUME-quota: parent is NULL [Invalid argument] > > > > > > I already found that setting the flag "trusted.glusterfs.quota.dirty" might help, but I'm unsure about the consequences that will be > > > > > > triggered. > > > > > > And I'm unsure about the necessary version flag. > > > > > > > > > > > > Has anyone an idea how to fix this? > > > > > > > > > > > > Best Regards, > > > > > > -- > > > > > > Frank Rühlemann > > > > > > IT-Systemtechnik > > > > > > > > > > > > UNIVERSITÄT ZU LÜBECK > > > > > > IT-Service-Center > > > > > > > > > > > > Ratzeburger Allee 160 > > > > > > 23562 Lübeck > > > > > > Tel +49 451 3101 2034 > > > > > > Fax +49 451 3101 2004 > > > > > > ruehlemann@xxxxxxxxxxxxxxxxxxx > > > > > > www.itsc.uni-luebeck.de > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Gluster-users mailing list > > > > > > Gluster-users@xxxxxxxxxxx > > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > -- > Regards, > Hari Gowtham.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users