What would happen if I: - Did not disable quotas - Did not stop the volume (140T volume takes at least 3-4 days to do any find operations, which is too much downtime) - Find and remove all xattrs: trusted.glusterfs.quota.242dcfd9-6aea-4cb8-beb2-c0ed91ad70d3.contri on the /brick/volumename/modules - set the dirty bit on /brick/volumename/modules As far as an upgrade to 3.7, I'm not comfortable with running the newest release - which version is RHGS based on? I typically like to follow supported product version if I can, so I know most of the kinks are worked out :) On Wed, Feb 10, 2016 at 11:02 PM, Manikandan Selvaganesh <mselvaga@xxxxxxxxxx> wrote: > Hi Steve, > > We suspect the mismatching in accounting is probably because of the > xattr's being not cleaned up properly. Please ensure you do the following > steps and make sure the xattr's are cleaned up properly before quota > is enabled for the next time. > > 1) stop the volume > 2) on each brick in the backend do > Find and remove all the xattrs and make sure they are not present > # find <brickpath>/module | xargs getfattr -d -m . -e hex | grep quota | grep -E 'contri|size' > # setxattr -x xattrname <path> > > 3) set dirty on <brickpath>/ > # setxattr -n trusted.glusterfs.quota.dirty -v 0x3100 > By setting dirty value on root as 1(0x3100), the contri will be calculated again > and the proper contri will be crawled and updated again. > > 4) Start volume and from a fuse mount > # stat /mountpath > > If you have ever performed a rename, then there is a possibility of two contributions > getting created for a single entry. > > We have fixed quite some rename issues and have refactored the marker approach. Also > as I have mentioned already we have also done Versioning of xattr's which solves the > issue you are facing in 3.7. It would be really helpful in a production environment if > you could upgrade to 3.7 > > -- > Thanks & Regards, > Manikandan Selvaganesh. > > ----- Original Message ----- > From: "Steve Dainard" <sdainard@xxxxxxxx> > To: "Manikandan Selvaganesh" <mselvaga@xxxxxxxxxx> > Cc: "Vijaikumar Mallikarjuna" <vmallika@xxxxxxxxxx>, "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx> > Sent: Thursday, February 11, 2016 1:48:19 AM > Subject: Re: Quota list not reflecting disk usage > > So after waiting out the process of disabling quotas, waiting for the > xattrs to be cleaned up, re-enabling quotas and waiting for the > xattr's to be created, then applying quotas I'm running into the same > issue. > > Yesterday at ~2pm one of the quotas was listed as: > /modules|100.0GB|18.3GB|81.7GB > > I initiated a copy from that glusterfs fuse mount to another fuse > mount for a different volume, and now I'm seeing: > /modules|100.0GB|27.4GB|72.6GB > > So an increase of 9GB usage. > > There were no writes at all to this directory during or after the cp. > > I did a bit of digging through the /modules directory on one of the > gluster nodes and created this spreadsheet: > https://docs.google.com/spreadsheets/d/1l_6ze68TCOcx6LEh9MFwmqPZ9bM-70CUlSM_8tpQ654/edit?usp=sharing > > The /modules/R/3.2.2 directory quota value doesn't come close to > matching the du value. > > Funny bit, there are TWO quota contribution attributes: > # getfattr -d -m quota -e hex 3.2.2 > # file: 3.2.2 > trusted.glusterfs.quota.242dcfd9-6aea-4cb8-beb2-c0ed91ad70d3.contri=0x0000000009af6000 > trusted.glusterfs.quota.c890be20-1bb9-4aec-a8d0-eacab0446f16.contri=0x0000000013fda800 > trusted.glusterfs.quota.dirty=0x3000 > trusted.glusterfs.quota.size=0x0000000013fda800 > > For reference, another directory /modules/R/2.14.2 has only one > contribution attribute: > # getfattr -d -m quota -e hex 2.14.2 > # file: 2.14.2 > trusted.glusterfs.quota.c890be20-1bb9-4aec-a8d0-eacab0446f16.contri=0x0000000000692800 > trusted.glusterfs.quota.dirty=0x3000 > trusted.glusterfs.quota.size=0x0000000000692800 > > Questions: > 1. Why wasn't the > trusted.glusterfs.quota.242dcfd9-6aea-4cb8-beb2-c0ed91ad70d3.contri=0x0000000009af6000 > cleaned up? > 2A. How can I remove old attributes from the fs, and then force a > re-calculation of contributions for the quota path /modules once I've > done this on all gluster nodes? > 2B. Or am I stuck yet again removing quotas completely, waiting for > the automated setfattr to remove the quotas for > c890be20-1bb9-4aec-a8d0-eacab0446f16 ID, manually removing attrs for > 242dcfd9-6aea-4cb8-beb2-c0ed91ad70d3, re-enabling quotas, waiting for > xattrs to be generated, then enabling limits? > 3. Shouldn't there be a command to re-trigger quota accounting on a > directory that confirms the attrs are set correctly and checks that > the contribution attr actually match disk usage? > > On Tue, Feb 2, 2016 at 3:00 AM, Manikandan Selvaganesh > <mselvaga@xxxxxxxxxx> wrote: >> Hi Steve, >> >> As you have mentioned, if you are using a glusterfs version lesser than 3.7, >> then you are doing it right. We are sorry to say but unfortunately that's the only >> way(manually going and cleaning up the xattr's before enabling quota or wait for >> the process to complete itself, which would take quite some time depending upon the >> files) that can be done so as not to mess up quota enforcing/accounting. Also, we could >> not find anything that could help us with the logs too. Thanks for the >> point. We are in the process of writing blogs and documenting clearly about quota and >> it's internal working. There is an initial blog[1] which we have written. More blogs will >> follow. >> >> With glusterfs-3.7, we have introduced something called "Quota versioning". >> So whenever you enable quota, we are suffixing a number(1..N) with the quota xattr's, >> say you enable quota for the first time and the xattr will be like, >> "trusted.glusterfs.quota.size.<suffix number from 1..N>". So all the quota related xattr's >> will have the number suffixed to the xattr. With the versioning patch[2], when you disable and >> enable quota again for the next time, it will be "trusted.glusterfs.quota.size.2"(Similarly >> for other quota related xattr's). So quota accounting can happen independently depending on >> the suffix and the cleanup process can go on independently which solves the issue that you >> have. >> >> [1] https://manikandanselvaganesh.wordpress.com/ >> >> [2] http://review.gluster.org/12386 >> >> -- >> Thanks & Regards, >> Manikandan Selvaganesh. >> >> ----- Original Message ----- >> From: "Vijaikumar Mallikarjuna" <vmallika@xxxxxxxxxx> >> To: "Steve Dainard" <sdainard@xxxxxxxx> >> Cc: "Manikandan Selvaganesh" <mselvaga@xxxxxxxxxx> >> Sent: Tuesday, February 2, 2016 10:12:51 AM >> Subject: Re: Quota list not reflecting disk usage >> >> Hi Steve, >> >> Sorry for the delay. Mani and myself was busy with something else at work, >> we will update you on this by eod. >> >> Many quota issues has been fixed in 3.7, also version numbers are added to >> quota xattrs, so when quota is disabled we don't need to cleanup the xattrs. >> >> Thanks, >> Vijay >> >> >> >> >> >> On Tue, Feb 2, 2016 at 12:26 AM, Steve Dainard <sdainard@xxxxxxxx> wrote: >> >>> I haven't heard anything back on this thread so here's where I've landed: >>> >>> It appears that the quota xattr's are not being cleared when quota's >>> are disabled, so when they are disabled and re-enabled the value for >>> size is added to the previous size, making it appear that the 'Used' >>> space is significantly greater than it should be. This seems like a >>> bug, but I don't know what to file it against, or if the logs I >>> attached prove this. >>> >>> Also; the documentation doesn't make mention of how the quota system >>> works, and what happens when quotas are enabled/disabled. There seems >>> to be a background task for both settings: >>> On enable: "/usr/bin/find . -exec /usr/bin/stat {} \ ;" >>> On disable: setfattr is removing quota xattrs >>> >>> The thing is neither of these tasks are listed in 'gluster volume >>> status <volume>' ie: >>> >>> Status of volume: storage >>> Gluster process Port Online Pid >>> >>> ------------------------------------------------------------------------------ >>> Brick 10.0.231.50:/mnt/raid6-storage/storage 49156 Y 24899 >>> Brick 10.0.231.51:/mnt/raid6-storage/storage 49156 Y 2991 >>> Brick 10.0.231.52:/mnt/raid6-storage/storage 49156 Y 28853 >>> Brick 10.0.231.53:/mnt/raid6-storage/storage 49153 Y 2705 >>> NFS Server on localhost N/A N N/A >>> Quota Daemon on localhost N/A Y 30066 >>> NFS Server on 10.0.231.52 N/A N N/A >>> Quota Daemon on 10.0.231.52 N/A Y 24976 >>> NFS Server on 10.0.231.53 N/A N N/A >>> Quota Daemon on 10.0.231.53 N/A Y 30334 >>> NFS Server on 10.0.231.51 N/A N N/A >>> Quota Daemon on 10.0.231.51 N/A Y 15781 >>> >>> Task Status of Volume storage >>> >>> ------------------------------------------------------------------------------ >>> ******There are no active volume tasks******* >>> >>> (I added the asterisks above) >>> So without any visibility into these running tasks, or knowing of >>> their existence (not documented) it becomes very difficult to know >>> what's going on. On any reasonably large storage system these tasks >>> take days to complete and there should be some indication of this. >>> >>> Where I'm at right now: >>> - I disabled the quota's on volume 'storage' >>> - I started to manually remove xattrs until I realized there is an >>> automated task to do this. >>> - After waiting for 'ps aux | grep setfattr' to return nothing, I >>> re-enabled quotas >>> - I'm currently waiting for the stat tasks to complete >>> - Once the entire filesystem has been stat'ed, I'm going to set limits >>> again. >>> >>> As a note, this is a pretty brutal process on a system with 140T of >>> storage, and I can't imagine how much worse this would be if my nodes >>> had more than 12 disks per, or if I was at PB scale. >>> >>> On Mon, Jan 25, 2016 at 12:31 PM, Steve Dainard <sdainard@xxxxxxxx> wrote: >>> > Here's a l link to a tarball of one of the gluster hosts logs: >>> > https://dl.dropboxusercontent.com/u/21916057/gluster01.tar.gz >>> > >>> > I wanted to include past logs in case they were useful. >>> > >>> > Also, the volume I'm trying to get quota's working on is 'storage' >>> > you'll notice I have a brick issue on a different volume 'vm-storage'. >>> > >>> > In regards to the 3.7 upgrade. I'm a bit hesitant to move to the >>> > current release, I prefer to stay on a stable release with maintenance >>> > updates if possible. >>> > >>> > On Mon, Jan 25, 2016 at 12:09 PM, Manikandan Selvaganesh >>> > <mselvaga@xxxxxxxxxx> wrote: >>> >> Hi Steve, >>> >> >>> >> Also, do you have any plans to upgrade to the latest version. With 3.7, >>> >> we have re factored some approaches used in quota and marker and that >>> have >>> >> fixed quite some issues. >>> >> >>> >> -- >>> >> Thanks & Regards, >>> >> Manikandan Selvaganesh. >>> >> >>> >> ----- Original Message ----- >>> >> From: "Manikandan Selvaganesh" <mselvaga@xxxxxxxxxx> >>> >> To: "Steve Dainard" <sdainard@xxxxxxxx> >>> >> Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx> >>> >> Sent: Tuesday, January 26, 2016 1:31:10 AM >>> >> Subject: Re: Quota list not reflecting disk usage >>> >> >>> >> Hi Steve, >>> >> >>> >> Could you send us the glusterfs logs, it could help us debug the issue!! >>> >> >>> >> -- >>> >> Thanks & Regards, >>> >> Manikandan Selvaganesh. >>> >> >>> >> ----- Original Message ----- >>> >> From: "Steve Dainard" <sdainard@xxxxxxxx> >>> >> To: "Manikandan Selvaganesh" <mselvaga@xxxxxxxxxx> >>> >> Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx> >>> >> Sent: Tuesday, January 26, 2016 12:56:22 AM >>> >> Subject: Re: Quota list not reflecting disk usage >>> >> >>> >> Something is seriously wrong with the quota output: >>> >> >>> >> # gluster volume quota storage list >>> >> Path Hard-limit Soft-limit Used >>> >> Available Soft-limit exceeded? Hard-limit exceeded? >>> >> >>> --------------------------------------------------------------------------------------------------------------------------- >>> >> /projects-CanSISE 10.0TB 80% 27.8TB >>> >> 0Bytes Yes Yes >>> >> /data4/climate 105.0TB 80% 307.1TB >>> >> 0Bytes Yes Yes >>> >> /data4/forestry 50.0GB 80% 61.9GB >>> >> 0Bytes Yes Yes >>> >> /data4/projects 800.0GB 80% 2.0TB >>> >> 0Bytes Yes Yes >>> >> /data4/strays 85.0GB 80% 230.5GB >>> >> 0Bytes Yes Yes >>> >> /data4/gis 2.2TB 80% 6.3TB >>> >> 0Bytes Yes Yes >>> >> /data4/modperl 1.0TB 80% 953.2GB >>> >> 70.8GB Yes No >>> >> /data4/dem 1.0GB 80% 0Bytes >>> >> 1.0GB No No >>> >> /projects-hydrology-archive0 5.0TB 80% 14.4TB >>> >> 0Bytes Yes Yes >>> >> /climate-downscale-idf-ec 7.5TB 80% 5.1TB >>> >> 2.4TB No No >>> >> /climate-downscale-idf 5.0TB 80% 6.1TB >>> >> 0Bytes Yes Yes >>> >> /home 5.0TB 80% 11.8TB >>> >> 0Bytes Yes Yes >>> >> /projects-hydrology-scratch0 7.0TB 80% 169.1GB >>> >> 6.8TB No No >>> >> /projects-rci-scratch 10.0TB 80% 1.9TB >>> >> 8.1TB No No >>> >> /projects-dataportal 1.0TB 80% 775.4GB >>> >> 248.6GB No No >>> >> /modules 1.0TB 80% 36.1GB >>> >> 987.9GB No No >>> >> /data4/climate/downscale/CMIP5 65.0TB 80% 56.4TB >>> >> 8.6TB Yes No >>> >> >>> >> Gluster is listing 'Used' space of over 307TB on /data4/climate, but >>> >> the volume capacity is only 146T. >>> >> >>> >> This has happened after disabling quotas on the volume, re-enabling >>> >> quotas, and then setting quotas again. There was a lot of glusterfsd >>> >> CPU usage afterwards, and now 3 days later the quota's I set were all >>> >> missing except >>> >> >>> >> /data4/projects|800.0GB|2.0TB|0Bytes >>> >> >>> >> So I re-set the quotas and the output above is what I have. >>> >> >>> >> Previous to disabling quota's this was the output: >>> >> # gluster volume quota storage list >>> >> Path Hard-limit Soft-limit Used >>> >> Available Soft-limit exceeded? Hard-limit exceeded? >>> >> >>> --------------------------------------------------------------------------------------------------------------------------- >>> >> /data4/climate 105.0TB 80% 151.6TB >>> >> 0Bytes Yes Yes >>> >> /data4/forestry 50.0GB 80% 45.4GB >>> >> 4.6GB Yes No >>> >> /data4/projects 800.0GB 80% 753.1GB >>> >> 46.9GB Yes No >>> >> /data4/strays 85.0GB 80% 80.8GB >>> >> 4.2GB Yes No >>> >> /data4/gis 2.2TB 80% 2.1TB >>> >> 91.8GB Yes No >>> >> /data4/modperl 1.0TB 80% 948.1GB >>> >> 75.9GB Yes No >>> >> /data4/dem 1.0GB 80% 0Bytes >>> >> 1.0GB No No >>> >> /projects-CanSISE 10.0TB 80% 11.9TB >>> >> 0Bytes Yes Yes >>> >> /projects-hydrology-archive0 5.0TB 80% 4.8TB >>> >> 174.0GB Yes No >>> >> /climate-downscale-idf-ec 7.5TB 80% 5.0TB >>> >> 2.5TB No No >>> >> /climate-downscale-idf 5.0TB 80% 3.8TB >>> >> 1.2TB No No >>> >> /home 5.0TB 80% 4.7TB >>> >> 283.8GB Yes No >>> >> /projects-hydrology-scratch0 7.0TB 80% 95.9GB >>> >> 6.9TB No No >>> >> /projects-rci-scratch 10.0TB 80% 1.7TB >>> >> 8.3TB No No >>> >> /projects-dataportal 1.0TB 80% 775.4GB >>> >> 248.6GB No No >>> >> /modules 1.0TB 80% 14.6GB >>> >> 1009.4GB No No >>> >> /data4/climate/downscale/CMIP5 65.0TB 80% 56.4TB >>> >> 8.6TB Yes No >>> >> >>> >> I was so focused on the /projects-CanSISE quota not being accurate >>> >> that I missed that the 'Used' space on /data4/climate is listed higher >>> >> then the total gluster volume capacity. >>> >> >>> >> On Mon, Jan 25, 2016 at 10:52 AM, Steve Dainard <sdainard@xxxxxxxx> >>> wrote: >>> >>> Hi Manikandan >>> >>> >>> >>> I'm using 'du' not df in this case. >>> >>> >>> >>> On Thu, Jan 21, 2016 at 9:20 PM, Manikandan Selvaganesh >>> >>> <mselvaga@xxxxxxxxxx> wrote: >>> >>>> Hi Steve, >>> >>>> >>> >>>> If you would like disk usage using df utility by taking quota limits >>> into >>> >>>> consideration, then you are expected to run the following command. >>> >>>> >>> >>>> 'gluster volume set VOLNAME quota-deem-statfs on' >>> >>>> >>> >>>> with older versions where quota-deem-statfs is OFF by default. >>> However with >>> >>>> the latest versions, quota-deem-statfs is by default ON. In this >>> case, the total >>> >>>> disk space of the directory is taken as the quota hard limit set on >>> the directory >>> >>>> of the volume and disk utility would display accordingly. This >>> answers why there is >>> >>>> a mismatch in disk utility. >>> >>>> >>> >>>> Next, answering to quota mechanism and accuracy: There is something >>> called timeouts >>> >>>> in quota. For performance reasons, quota caches the directory size on >>> client. You can >>> >>>> set timeout indicating the maximum valid duration of directory sizes >>> in cache, >>> >>>> from the time they are populated. By default the hard-timeout is 5s >>> and soft timeout >>> >>>> is 60s. Setting a timeout of zero will do a force fetching of >>> directory sizes from server >>> >>>> for every operation that modifies file data and will effectively >>> disables directory size >>> >>>> caching on client side. If you do not have a timeout of 0(which we do >>> not encourage due to >>> >>>> performance reasons), then till you reach soft-limit, soft timeout >>> will be taken into >>> >>>> consideration, and only for every 60s operations will be synced and >>> that could cause the >>> >>>> usage to exceed more than the hard-limit specified. If you would like >>> quota to >>> >>>> strictly enforce then please run the following commands, >>> >>>> >>> >>>> 'gluster v quota VOLNAME hard-timeout 0s' >>> >>>> 'gluster v quota VOLNAME soft-timeout 0s' >>> >>>> >>> >>>> Appreciate your curiosity in exploring and if you would like to know >>> more about quota >>> >>>> please refer[1] >>> >>>> >>> >>>> [1] >>> http://gluster.readthedocs.org/en/release-3.7.0-1/Administrator%20Guide/Directory%20Quota/ >>> >>>> >>> >>>> -- >>> >>>> Thanks & Regards, >>> >>>> Manikandan Selvaganesh. >>> >>>> >>> >>>> ----- Original Message ----- >>> >>>> From: "Steve Dainard" <sdainard@xxxxxxxx> >>> >>>> To: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx> >>> >>>> Sent: Friday, January 22, 2016 1:40:07 AM >>> >>>> Subject: Re: Quota list not reflecting disk usage >>> >>>> >>> >>>> This is gluster 3.6.6. >>> >>>> >>> >>>> I've attempted to disable and re-enable quota's on the volume, but >>> >>>> when I re-apply the quotas on each directory the same 'Used' value is >>> >>>> present as before. >>> >>>> >>> >>>> Where is quotad getting its information from, and how can I clean >>> >>>> up/regenerate that info? >>> >>>> >>> >>>> On Thu, Jan 21, 2016 at 10:07 AM, Steve Dainard <sdainard@xxxxxxxx> >>> wrote: >>> >>>>> I have a distributed volume with quota's enabled: >>> >>>>> >>> >>>>> Volume Name: storage >>> >>>>> Type: Distribute >>> >>>>> Volume ID: 26d355cb-c486-481f-ac16-e25390e73775 >>> >>>>> Status: Started >>> >>>>> Number of Bricks: 4 >>> >>>>> Transport-type: tcp >>> >>>>> Bricks: >>> >>>>> Brick1: 10.0.231.50:/mnt/raid6-storage/storage >>> >>>>> Brick2: 10.0.231.51:/mnt/raid6-storage/storage >>> >>>>> Brick3: 10.0.231.52:/mnt/raid6-storage/storage >>> >>>>> Brick4: 10.0.231.53:/mnt/raid6-storage/storage >>> >>>>> Options Reconfigured: >>> >>>>> performance.cache-size: 1GB >>> >>>>> performance.readdir-ahead: on >>> >>>>> features.quota: on >>> >>>>> diagnostics.brick-log-level: WARNING >>> >>>>> >>> >>>>> Here is a partial list of quotas: >>> >>>>> # /usr/sbin/gluster volume quota storage list >>> >>>>> Path Hard-limit Soft-limit Used >>> >>>>> Available Soft-limit exceeded? Hard-limit exceeded? >>> >>>>> >>> --------------------------------------------------------------------------------------------------------------------------- >>> >>>>> ... >>> >>>>> /projects-CanSISE 10.0TB 80% >>> 11.9TB >>> >>>>> 0Bytes Yes Yes >>> >>>>> ... >>> >>>>> >>> >>>>> If I du on that location I do not get 11.9TB of space used (fuse >>> mount point): >>> >>>>> [root@storage projects-CanSISE]# du -hs >>> >>>>> 9.5T . >>> >>>>> >>> >>>>> Can someone provide an explanation for how the quota mechanism tracks >>> >>>>> disk usage? How often does the quota mechanism check its accuracy? >>> And >>> >>>>> how could it get so far off? >>> >>>>> >>> >>>>> Can I get gluster to rescan that location and update the quota usage? >>> >>>>> >>> >>>>> Thanks, >>> >>>>> Steve >>> >>>> _______________________________________________ >>> >>>> Gluster-users mailing list >>> >>>> Gluster-users@xxxxxxxxxxx >>> >>>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >> _______________________________________________ >>> >> Gluster-users mailing list >>> >> Gluster-users@xxxxxxxxxxx >>> >> http://www.gluster.org/mailman/listinfo/gluster-users >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users@xxxxxxxxxxx >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users