So after waiting out the process of disabling quotas, waiting for the xattrs to be cleaned up, re-enabling quotas and waiting for the xattr's to be created, then applying quotas I'm running into the same issue. Yesterday at ~2pm one of the quotas was listed as: /modules|100.0GB|18.3GB|81.7GB I initiated a copy from that glusterfs fuse mount to another fuse mount for a different volume, and now I'm seeing: /modules|100.0GB|27.4GB|72.6GB So an increase of 9GB usage. There were no writes at all to this directory during or after the cp. I did a bit of digging through the /modules directory on one of the gluster nodes and created this spreadsheet: https://docs.google.com/spreadsheets/d/1l_6ze68TCOcx6LEh9MFwmqPZ9bM-70CUlSM_8tpQ654/edit?usp=sharing The /modules/R/3.2.2 directory quota value doesn't come close to matching the du value. Funny bit, there are TWO quota contribution attributes: # getfattr -d -m quota -e hex 3.2.2 # file: 3.2.2 trusted.glusterfs.quota.242dcfd9-6aea-4cb8-beb2-c0ed91ad70d3.contri=0x0000000009af6000 trusted.glusterfs.quota.c890be20-1bb9-4aec-a8d0-eacab0446f16.contri=0x0000000013fda800 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size=0x0000000013fda800 For reference, another directory /modules/R/2.14.2 has only one contribution attribute: # getfattr -d -m quota -e hex 2.14.2 # file: 2.14.2 trusted.glusterfs.quota.c890be20-1bb9-4aec-a8d0-eacab0446f16.contri=0x0000000000692800 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size=0x0000000000692800 Questions: 1. Why wasn't the trusted.glusterfs.quota.242dcfd9-6aea-4cb8-beb2-c0ed91ad70d3.contri=0x0000000009af6000 cleaned up? 2A. How can I remove old attributes from the fs, and then force a re-calculation of contributions for the quota path /modules once I've done this on all gluster nodes? 2B. Or am I stuck yet again removing quotas completely, waiting for the automated setfattr to remove the quotas for c890be20-1bb9-4aec-a8d0-eacab0446f16 ID, manually removing attrs for 242dcfd9-6aea-4cb8-beb2-c0ed91ad70d3, re-enabling quotas, waiting for xattrs to be generated, then enabling limits? 3. Shouldn't there be a command to re-trigger quota accounting on a directory that confirms the attrs are set correctly and checks that the contribution attr actually match disk usage? On Tue, Feb 2, 2016 at 3:00 AM, Manikandan Selvaganesh <mselvaga@xxxxxxxxxx> wrote: > Hi Steve, > > As you have mentioned, if you are using a glusterfs version lesser than 3.7, > then you are doing it right. We are sorry to say but unfortunately that's the only > way(manually going and cleaning up the xattr's before enabling quota or wait for > the process to complete itself, which would take quite some time depending upon the > files) that can be done so as not to mess up quota enforcing/accounting. Also, we could > not find anything that could help us with the logs too. Thanks for the > point. We are in the process of writing blogs and documenting clearly about quota and > it's internal working. There is an initial blog[1] which we have written. More blogs will > follow. > > With glusterfs-3.7, we have introduced something called "Quota versioning". > So whenever you enable quota, we are suffixing a number(1..N) with the quota xattr's, > say you enable quota for the first time and the xattr will be like, > "trusted.glusterfs.quota.size.<suffix number from 1..N>". So all the quota related xattr's > will have the number suffixed to the xattr. With the versioning patch[2], when you disable and > enable quota again for the next time, it will be "trusted.glusterfs.quota.size.2"(Similarly > for other quota related xattr's). So quota accounting can happen independently depending on > the suffix and the cleanup process can go on independently which solves the issue that you > have. > > [1] https://manikandanselvaganesh.wordpress.com/ > > [2] http://review.gluster.org/12386 > > -- > Thanks & Regards, > Manikandan Selvaganesh. > > ----- Original Message ----- > From: "Vijaikumar Mallikarjuna" <vmallika@xxxxxxxxxx> > To: "Steve Dainard" <sdainard@xxxxxxxx> > Cc: "Manikandan Selvaganesh" <mselvaga@xxxxxxxxxx> > Sent: Tuesday, February 2, 2016 10:12:51 AM > Subject: Re: [Gluster-users] Quota list not reflecting disk usage > > Hi Steve, > > Sorry for the delay. Mani and myself was busy with something else at work, > we will update you on this by eod. > > Many quota issues has been fixed in 3.7, also version numbers are added to > quota xattrs, so when quota is disabled we don't need to cleanup the xattrs. > > Thanks, > Vijay > > > > > > On Tue, Feb 2, 2016 at 12:26 AM, Steve Dainard <sdainard@xxxxxxxx> wrote: > >> I haven't heard anything back on this thread so here's where I've landed: >> >> It appears that the quota xattr's are not being cleared when quota's >> are disabled, so when they are disabled and re-enabled the value for >> size is added to the previous size, making it appear that the 'Used' >> space is significantly greater than it should be. This seems like a >> bug, but I don't know what to file it against, or if the logs I >> attached prove this. >> >> Also; the documentation doesn't make mention of how the quota system >> works, and what happens when quotas are enabled/disabled. There seems >> to be a background task for both settings: >> On enable: "/usr/bin/find . -exec /usr/bin/stat {} \ ;" >> On disable: setfattr is removing quota xattrs >> >> The thing is neither of these tasks are listed in 'gluster volume >> status <volume>' ie: >> >> Status of volume: storage >> Gluster process Port Online Pid >> >> ------------------------------------------------------------------------------ >> Brick 10.0.231.50:/mnt/raid6-storage/storage 49156 Y 24899 >> Brick 10.0.231.51:/mnt/raid6-storage/storage 49156 Y 2991 >> Brick 10.0.231.52:/mnt/raid6-storage/storage 49156 Y 28853 >> Brick 10.0.231.53:/mnt/raid6-storage/storage 49153 Y 2705 >> NFS Server on localhost N/A N N/A >> Quota Daemon on localhost N/A Y 30066 >> NFS Server on 10.0.231.52 N/A N N/A >> Quota Daemon on 10.0.231.52 N/A Y 24976 >> NFS Server on 10.0.231.53 N/A N N/A >> Quota Daemon on 10.0.231.53 N/A Y 30334 >> NFS Server on 10.0.231.51 N/A N N/A >> Quota Daemon on 10.0.231.51 N/A Y 15781 >> >> Task Status of Volume storage >> >> ------------------------------------------------------------------------------ >> ******There are no active volume tasks******* >> >> (I added the asterisks above) >> So without any visibility into these running tasks, or knowing of >> their existence (not documented) it becomes very difficult to know >> what's going on. On any reasonably large storage system these tasks >> take days to complete and there should be some indication of this. >> >> Where I'm at right now: >> - I disabled the quota's on volume 'storage' >> - I started to manually remove xattrs until I realized there is an >> automated task to do this. >> - After waiting for 'ps aux | grep setfattr' to return nothing, I >> re-enabled quotas >> - I'm currently waiting for the stat tasks to complete >> - Once the entire filesystem has been stat'ed, I'm going to set limits >> again. >> >> As a note, this is a pretty brutal process on a system with 140T of >> storage, and I can't imagine how much worse this would be if my nodes >> had more than 12 disks per, or if I was at PB scale. >> >> On Mon, Jan 25, 2016 at 12:31 PM, Steve Dainard <sdainard@xxxxxxxx> wrote: >> > Here's a l link to a tarball of one of the gluster hosts logs: >> > https://dl.dropboxusercontent.com/u/21916057/gluster01.tar.gz >> > >> > I wanted to include past logs in case they were useful. >> > >> > Also, the volume I'm trying to get quota's working on is 'storage' >> > you'll notice I have a brick issue on a different volume 'vm-storage'. >> > >> > In regards to the 3.7 upgrade. I'm a bit hesitant to move to the >> > current release, I prefer to stay on a stable release with maintenance >> > updates if possible. >> > >> > On Mon, Jan 25, 2016 at 12:09 PM, Manikandan Selvaganesh >> > <mselvaga@xxxxxxxxxx> wrote: >> >> Hi Steve, >> >> >> >> Also, do you have any plans to upgrade to the latest version. With 3.7, >> >> we have re factored some approaches used in quota and marker and that >> have >> >> fixed quite some issues. >> >> >> >> -- >> >> Thanks & Regards, >> >> Manikandan Selvaganesh. >> >> >> >> ----- Original Message ----- >> >> From: "Manikandan Selvaganesh" <mselvaga@xxxxxxxxxx> >> >> To: "Steve Dainard" <sdainard@xxxxxxxx> >> >> Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx> >> >> Sent: Tuesday, January 26, 2016 1:31:10 AM >> >> Subject: Re: [Gluster-users] Quota list not reflecting disk usage >> >> >> >> Hi Steve, >> >> >> >> Could you send us the glusterfs logs, it could help us debug the issue!! >> >> >> >> -- >> >> Thanks & Regards, >> >> Manikandan Selvaganesh. >> >> >> >> ----- Original Message ----- >> >> From: "Steve Dainard" <sdainard@xxxxxxxx> >> >> To: "Manikandan Selvaganesh" <mselvaga@xxxxxxxxxx> >> >> Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx> >> >> Sent: Tuesday, January 26, 2016 12:56:22 AM >> >> Subject: Re: [Gluster-users] Quota list not reflecting disk usage >> >> >> >> Something is seriously wrong with the quota output: >> >> >> >> # gluster volume quota storage list >> >> Path Hard-limit Soft-limit Used >> >> Available Soft-limit exceeded? Hard-limit exceeded? >> >> >> --------------------------------------------------------------------------------------------------------------------------- >> >> /projects-CanSISE 10.0TB 80% 27.8TB >> >> 0Bytes Yes Yes >> >> /data4/climate 105.0TB 80% 307.1TB >> >> 0Bytes Yes Yes >> >> /data4/forestry 50.0GB 80% 61.9GB >> >> 0Bytes Yes Yes >> >> /data4/projects 800.0GB 80% 2.0TB >> >> 0Bytes Yes Yes >> >> /data4/strays 85.0GB 80% 230.5GB >> >> 0Bytes Yes Yes >> >> /data4/gis 2.2TB 80% 6.3TB >> >> 0Bytes Yes Yes >> >> /data4/modperl 1.0TB 80% 953.2GB >> >> 70.8GB Yes No >> >> /data4/dem 1.0GB 80% 0Bytes >> >> 1.0GB No No >> >> /projects-hydrology-archive0 5.0TB 80% 14.4TB >> >> 0Bytes Yes Yes >> >> /climate-downscale-idf-ec 7.5TB 80% 5.1TB >> >> 2.4TB No No >> >> /climate-downscale-idf 5.0TB 80% 6.1TB >> >> 0Bytes Yes Yes >> >> /home 5.0TB 80% 11.8TB >> >> 0Bytes Yes Yes >> >> /projects-hydrology-scratch0 7.0TB 80% 169.1GB >> >> 6.8TB No No >> >> /projects-rci-scratch 10.0TB 80% 1.9TB >> >> 8.1TB No No >> >> /projects-dataportal 1.0TB 80% 775.4GB >> >> 248.6GB No No >> >> /modules 1.0TB 80% 36.1GB >> >> 987.9GB No No >> >> /data4/climate/downscale/CMIP5 65.0TB 80% 56.4TB >> >> 8.6TB Yes No >> >> >> >> Gluster is listing 'Used' space of over 307TB on /data4/climate, but >> >> the volume capacity is only 146T. >> >> >> >> This has happened after disabling quotas on the volume, re-enabling >> >> quotas, and then setting quotas again. There was a lot of glusterfsd >> >> CPU usage afterwards, and now 3 days later the quota's I set were all >> >> missing except >> >> >> >> /data4/projects|800.0GB|2.0TB|0Bytes >> >> >> >> So I re-set the quotas and the output above is what I have. >> >> >> >> Previous to disabling quota's this was the output: >> >> # gluster volume quota storage list >> >> Path Hard-limit Soft-limit Used >> >> Available Soft-limit exceeded? Hard-limit exceeded? >> >> >> --------------------------------------------------------------------------------------------------------------------------- >> >> /data4/climate 105.0TB 80% 151.6TB >> >> 0Bytes Yes Yes >> >> /data4/forestry 50.0GB 80% 45.4GB >> >> 4.6GB Yes No >> >> /data4/projects 800.0GB 80% 753.1GB >> >> 46.9GB Yes No >> >> /data4/strays 85.0GB 80% 80.8GB >> >> 4.2GB Yes No >> >> /data4/gis 2.2TB 80% 2.1TB >> >> 91.8GB Yes No >> >> /data4/modperl 1.0TB 80% 948.1GB >> >> 75.9GB Yes No >> >> /data4/dem 1.0GB 80% 0Bytes >> >> 1.0GB No No >> >> /projects-CanSISE 10.0TB 80% 11.9TB >> >> 0Bytes Yes Yes >> >> /projects-hydrology-archive0 5.0TB 80% 4.8TB >> >> 174.0GB Yes No >> >> /climate-downscale-idf-ec 7.5TB 80% 5.0TB >> >> 2.5TB No No >> >> /climate-downscale-idf 5.0TB 80% 3.8TB >> >> 1.2TB No No >> >> /home 5.0TB 80% 4.7TB >> >> 283.8GB Yes No >> >> /projects-hydrology-scratch0 7.0TB 80% 95.9GB >> >> 6.9TB No No >> >> /projects-rci-scratch 10.0TB 80% 1.7TB >> >> 8.3TB No No >> >> /projects-dataportal 1.0TB 80% 775.4GB >> >> 248.6GB No No >> >> /modules 1.0TB 80% 14.6GB >> >> 1009.4GB No No >> >> /data4/climate/downscale/CMIP5 65.0TB 80% 56.4TB >> >> 8.6TB Yes No >> >> >> >> I was so focused on the /projects-CanSISE quota not being accurate >> >> that I missed that the 'Used' space on /data4/climate is listed higher >> >> then the total gluster volume capacity. >> >> >> >> On Mon, Jan 25, 2016 at 10:52 AM, Steve Dainard <sdainard@xxxxxxxx> >> wrote: >> >>> Hi Manikandan >> >>> >> >>> I'm using 'du' not df in this case. >> >>> >> >>> On Thu, Jan 21, 2016 at 9:20 PM, Manikandan Selvaganesh >> >>> <mselvaga@xxxxxxxxxx> wrote: >> >>>> Hi Steve, >> >>>> >> >>>> If you would like disk usage using df utility by taking quota limits >> into >> >>>> consideration, then you are expected to run the following command. >> >>>> >> >>>> 'gluster volume set VOLNAME quota-deem-statfs on' >> >>>> >> >>>> with older versions where quota-deem-statfs is OFF by default. >> However with >> >>>> the latest versions, quota-deem-statfs is by default ON. In this >> case, the total >> >>>> disk space of the directory is taken as the quota hard limit set on >> the directory >> >>>> of the volume and disk utility would display accordingly. This >> answers why there is >> >>>> a mismatch in disk utility. >> >>>> >> >>>> Next, answering to quota mechanism and accuracy: There is something >> called timeouts >> >>>> in quota. For performance reasons, quota caches the directory size on >> client. You can >> >>>> set timeout indicating the maximum valid duration of directory sizes >> in cache, >> >>>> from the time they are populated. By default the hard-timeout is 5s >> and soft timeout >> >>>> is 60s. Setting a timeout of zero will do a force fetching of >> directory sizes from server >> >>>> for every operation that modifies file data and will effectively >> disables directory size >> >>>> caching on client side. If you do not have a timeout of 0(which we do >> not encourage due to >> >>>> performance reasons), then till you reach soft-limit, soft timeout >> will be taken into >> >>>> consideration, and only for every 60s operations will be synced and >> that could cause the >> >>>> usage to exceed more than the hard-limit specified. If you would like >> quota to >> >>>> strictly enforce then please run the following commands, >> >>>> >> >>>> 'gluster v quota VOLNAME hard-timeout 0s' >> >>>> 'gluster v quota VOLNAME soft-timeout 0s' >> >>>> >> >>>> Appreciate your curiosity in exploring and if you would like to know >> more about quota >> >>>> please refer[1] >> >>>> >> >>>> [1] >> http://gluster.readthedocs.org/en/release-3.7.0-1/Administrator%20Guide/Directory%20Quota/ >> >>>> >> >>>> -- >> >>>> Thanks & Regards, >> >>>> Manikandan Selvaganesh. >> >>>> >> >>>> ----- Original Message ----- >> >>>> From: "Steve Dainard" <sdainard@xxxxxxxx> >> >>>> To: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx> >> >>>> Sent: Friday, January 22, 2016 1:40:07 AM >> >>>> Subject: Re: [Gluster-users] Quota list not reflecting disk usage >> >>>> >> >>>> This is gluster 3.6.6. >> >>>> >> >>>> I've attempted to disable and re-enable quota's on the volume, but >> >>>> when I re-apply the quotas on each directory the same 'Used' value is >> >>>> present as before. >> >>>> >> >>>> Where is quotad getting its information from, and how can I clean >> >>>> up/regenerate that info? >> >>>> >> >>>> On Thu, Jan 21, 2016 at 10:07 AM, Steve Dainard <sdainard@xxxxxxxx> >> wrote: >> >>>>> I have a distributed volume with quota's enabled: >> >>>>> >> >>>>> Volume Name: storage >> >>>>> Type: Distribute >> >>>>> Volume ID: 26d355cb-c486-481f-ac16-e25390e73775 >> >>>>> Status: Started >> >>>>> Number of Bricks: 4 >> >>>>> Transport-type: tcp >> >>>>> Bricks: >> >>>>> Brick1: 10.0.231.50:/mnt/raid6-storage/storage >> >>>>> Brick2: 10.0.231.51:/mnt/raid6-storage/storage >> >>>>> Brick3: 10.0.231.52:/mnt/raid6-storage/storage >> >>>>> Brick4: 10.0.231.53:/mnt/raid6-storage/storage >> >>>>> Options Reconfigured: >> >>>>> performance.cache-size: 1GB >> >>>>> performance.readdir-ahead: on >> >>>>> features.quota: on >> >>>>> diagnostics.brick-log-level: WARNING >> >>>>> >> >>>>> Here is a partial list of quotas: >> >>>>> # /usr/sbin/gluster volume quota storage list >> >>>>> Path Hard-limit Soft-limit Used >> >>>>> Available Soft-limit exceeded? Hard-limit exceeded? >> >>>>> >> --------------------------------------------------------------------------------------------------------------------------- >> >>>>> ... >> >>>>> /projects-CanSISE 10.0TB 80% >> 11.9TB >> >>>>> 0Bytes Yes Yes >> >>>>> ... >> >>>>> >> >>>>> If I du on that location I do not get 11.9TB of space used (fuse >> mount point): >> >>>>> [root@storage projects-CanSISE]# du -hs >> >>>>> 9.5T . >> >>>>> >> >>>>> Can someone provide an explanation for how the quota mechanism tracks >> >>>>> disk usage? How often does the quota mechanism check its accuracy? >> And >> >>>>> how could it get so far off? >> >>>>> >> >>>>> Can I get gluster to rescan that location and update the quota usage? >> >>>>> >> >>>>> Thanks, >> >>>>> Steve >> >>>> _______________________________________________ >> >>>> Gluster-users mailing list >> >>>> Gluster-users@xxxxxxxxxxx >> >>>> http://www.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> Gluster-users@xxxxxxxxxxx >> >> http://www.gluster.org/mailman/listinfo/gluster-users >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> http://www.gluster.org/mailman/listinfo/gluster-users >> _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel