Hi Joe, Please find the attached scripts. Check if this solves the problem1) 'quota-verify': This helps finding directories whose quota accounting is wrong 2) 'quota-heal': This helps healing the directories which is identified by script 'quota-verify'
Usage of these script:This needs to be executed for all the bricks on all the nodes in the cluster where quota is enabled.
# quota-verify -b <brick_path1> >> logfile_node_1 # quota-verify -b <brick_path2> >> logfile_node_1This needs to be executed on all the nodes. Please make sure that no IO is happening when running the quota-heal script
# quota-heal -l logfile_node_1 # quota-heal -l logfile_node_2 ... Thanks, Vijay On Thursday 22 January 2015 01:03 PM, Raghavendra Gowdappa wrote:
----- Original Message -----From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> To: "Joe Julian" <joe@xxxxxxxxxxxxxxxx> Cc: "Gluster-devel@xxxxxxxxxxx" <gluster-devel@xxxxxxxxxxx>, pkoro@xxxxxxxxxxxx Sent: Thursday, January 22, 2015 12:58:47 PM Subject: Re: Quota problems without a way of fixing them ----- Original Message -----From: "Joe Julian" <joe@xxxxxxxxxxxxxxxx> To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> Cc: pkoro@xxxxxxxxxxxx, "Gluster-devel@xxxxxxxxxxx" <gluster-devel@xxxxxxxxxxx> Sent: Thursday, January 22, 2015 11:16:39 AM Subject: Re: Quota problems without a way of fixing them On 01/21/2015 09:32 PM, Raghavendra Gowdappa wrote:----- Original Message -----From: "Joe Julian" <joe@xxxxxxxxxxxxxxxx> To: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> Cc: "Paschalis Korosoglou" <pkoro@xxxxxxxxxxxx> Sent: Thursday, January 22, 2015 12:54:44 AM Subject: Quota problems without a way of fixing them Paschalis (PeterA in #gluster) has reported these bugs and we've tried to find the source of the problem to no avail. Worse yet, there's no way to just reset the quotas to match what's actually there, as far as I can tell. What should we look for to isolate the source of this problem since this is a production system with enough activity to make isolating the repro difficult at best, and debug logs have enough noise to make isolation nearly impossible? Finally, isn't there some simple way to trigger quota to rescan a path to reset trusted.glusterfs.quota.size ?1. Delete following xattrs from all the files/directories on all the bricks a) trusted.glusterfs.quota.size b) trusted.glusterfs.quota.*.contri c) trusted.glusterfs.quota.dirty 2. Turn off md-cache # gluster volume set <volname> performance.stat-prefetch off 3. Mount glusterfs asking not to use readdirp instead of readdir # mount -t glusterfs -o use-readdirp=no <volfile-server>:<volfile-id> /mnt/glusterfs 4. Do a crawl on the mountpoint # find /mnt/glusterfs -exec stat \{} \; > /dev/null This should correct the accounting on bricks. Once done, you should see correct values in quota list output. Please let us know if it doesn't work for you.But that could be a months-long process with the size of many of our users volumes. There should be a way to do this with a single directory tree.If you can isolate a sub-directory tree where size accounting has gone bad,But, the problem with this approach is that how do we know whether parents of this sub-directory have correct size. If a subdirectory has wrong size, then most likely accounting of all the ancestors of that sub-directory till root has gone bad. Hence I am skeptic about just healing "part" of a directory tree.this can be done by setting xattr trusted.glusterfs.quota.dirty of a directory to 1 and sending a lookup on that directory. Basically what this does is to add sizes of all immediate children and set that as the value of trusted.glusterfs.quota.size on the directory. But, the catch here is that the sizes of immediate children need not be accounted correctly. Hence this healing should be done bottom up starting with bottom-most directory and healing towards the top-level subdirectory which is isolated. We can have an algorithm like this: void heal (char *path) { char value = 1; struct stbuf = {0, }; setxattr (path, "trusted.glusterfs.quota.dirty", (const void *) &value, sizeof (value)); /* now the dirty xattr has been set, trigger a lookup, so that the directory is healed */ stat (path, &stbuf); return; } void crawl (DIR *dirfd, char *path) { struct dirent *result = NULL, entry = {0, }; while (result = readdir (dirfd, &entry, NULL)) { if (IA_ISDIR (result->d_type)) { DIR *childfd = NULL; char *childpath = NULL; childpath = construct_path (path, entry->d_name); childfd = opendir (entry->d_name); crawl (childfd, childpath); } } heal (dirfd); return; } Now call crawl on isolated sub-directory (on the mountpoint). Note that above is a psudo-code, and a tool should be written using the above algo. We'll try to add a program to extras/utils which does this.His production system has been unmanageable for months now. It is possible for someone spare some cycles to get this looked at? 2013-03-04 - https://bugzilla.redhat.com/show_bug.cgi?id=917901 2013-10-24 - https://bugzilla.redhat.com/show_bug.cgi?id=1023134We are working on these bugs. We'll update on the bugzilla once we find anything substantial._______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel
Attachment:
quota-heal.gz
Description: application/gzip
Attachment:
quota-verify.gz
Description: application/gzip
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel