----- Original Message ----- > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > To: "Shyam" <srangana@xxxxxxxxxx> > Cc: gluster-devel@xxxxxxxxxxx > Sent: Tuesday, May 19, 2015 11:49:56 AM > Subject: Re: Moratorium on new patch acceptance > > > > ----- Original Message ----- > > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > > To: "Shyam" <srangana@xxxxxxxxxx> > > Cc: gluster-devel@xxxxxxxxxxx > > Sent: Tuesday, May 19, 2015 11:46:19 AM > > Subject: Re: Moratorium on new patch acceptance > > > > > > > > ----- Original Message ----- > > > From: "Shyam" <srangana@xxxxxxxxxx> > > > To: gluster-devel@xxxxxxxxxxx > > > Sent: Tuesday, May 19, 2015 6:13:06 AM > > > Subject: Re: Moratorium on new patch acceptance > > > > > > On 05/18/2015 07:05 PM, Shyam wrote: > > > > On 05/18/2015 03:49 PM, Shyam wrote: > > > >> On 05/18/2015 10:33 AM, Vijay Bellur wrote: > > > >> > > > >> The etherpad did not call out, ./tests/bugs/distribute/bug-1161156.t > > > >> which did not have an owner, and so I took a stab at it and below are > > > >> the results. > > > >> > > > >> I also think failure in ./tests/bugs/quota/bug-1038598.t is the same > > > >> as > > > >> the observation below. > > > >> > > > >> NOTE: Anyone with better knowledge of Quota can possibly chip in as to > > > >> what should we expect in this case and how to correct the expectation > > > >> from these test cases. > > > >> > > > >> (Details of ./tests/bugs/distribute/bug-1161156.t) > > > >> 1) Failure is in TEST #20 > > > >> Failed line: TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=1k > > > >> count=10240 conv=fdatasync > > > >> > > > >> 2) The above line is expected to fail (i.e dd is expected to fail) as, > > > >> the set quota is 20MB and we are attempting to exceed it by another > > > >> 5MB > > > >> at this point in the test case. > > > >> > > > >> 3) The failure is easily reproducible in my laptop, 2/10 times > > > >> > > > >> 4) On debugging, I see that when the above dd succeeds (or the test > > > >> fails, which means dd succeeded in writing more than the set quota), > > > >> there are no write errors from the bricks or any errors on the final > > > >> COMMIT RPC call to NFS. > > > >> > > > >> As a result the expectation of this test fails. > > > >> > > > >> NOTE: Sometimes there is a write failure from one of the bricks (the > > > >> above test uses AFR as well), but AFR self healing kicks in and fixes > > > >> the problem, as expected, as the write succeeded on one of the > > > >> replicas. > > > >> I add this observation, as the failed regression run logs, has some > > > >> EDQUOT errors reported in the client xlator, but only from one of the > > > >> client bricks, and there are further AFR self heal logs noted in the > > > >> logs. > > > >> > > > >> 5) When the test case succeeds the writes fail with EDQUOT as > > > >> expected. > > > >> There are times when the quota is exceeded by say 1MB - 4.8MB, but the > > > >> test case still passes. Which means that, if we were to try to exceed > > > >> the quota by 1MB (instead of the 5MB as in the test case), this test > > > >> case may fail always. > > > > > > > > Here is why I think this passes by quota sometime and not others making > > > > this and the other test case mentioned below spurious. > > > > - Each write is 256K from the client (that is what is sent over the > > > > wire) > > > > - If more IO was queued by io-threads after passing quota checks, which > > > > in this 5MB case requires >20 IOs to be queued (16 IOs could be active > > > > in io-threads itself), we could end up writing more than the quota > > > > amount > > > > > > > > So, if quota checks to see if a write is violating the quota, and let's > > > > it through, and updates on the UNWIND the space used for future checks, > > > > we could have more IO outstanding than what the quota allows, and as a > > > > result allow such a larger write to pass through, considering IO > > > > threads > > > > queue and active IOs as well. Would this be a fair assumption of how > > > > quota works? > > > > Yes, this is a possible scenario. There is a finite time window between, > > > > 1. Querying the size of a directory. In other words checking whether > > current > > write can be allowed > > 2. The "effect" of this write getting reflected in size of all the parent > > directories of a file till root > > > > If 1 and 2 were atomic, another parallel write which could've exceed the > > quota-limit could not have slipped through. Unfortunately, in the current > > scheme of things they are not atomic. Now there can be parallel writes in > > this test case because of nfs-client and/or glusterfs write-back (though > > we've one single threaded application - dd - running). One way of testing > > this hypothesis is to disable nfs and glusterfs write-back and run the same > > (unmodified) test and the test should succeed always (dd should fail). To > > disable write-back in nfs you can use noac option while mounting. > > > > The situation becomes worse in real-life scenarios because of parallelism > > involved at many layers: > > > > 1. multiple applications, each possibly being multithreaded writing to > > possibly many/or single file(s) in a quota subtree > > 2. write-back in NFS-client and glusterfs > > 3. Multiple bricks holding files of a quota-subtree. Each brick processing > > simultaneously many write requests through io-threads. > > 4. Background accounting of directory sizes _after_ a write is complete. > > > > > I've tried in past to fix the issue, though unsuccessfully. It seems to me > > that one effective strategy is to make enforcement and updation of size of > > parents atomic. But if we do that we end up adding latency of accounting to > > latency of fop. Other options can be explored. But, our Quota functionality > > requirements allow a buffer of 10% while enforcing limits. So, this issue > > has not been high on our priority list till now. So, our tests should also > > expect failures allowing for this 10% buffer. Since most of our tests are a single instance of single threaded dd running on a single mount, if the hypothesis turns out true, we can turn off nfs-client and glusterfs write-back in all tests related to Quota. Comments? > > > > > > > > > > I believe this is what is happening in this case. Checking a fix on my > > > > machine, and will post the same if it proves to be help the situation. > > > > > > Posted a patch to fix the problem: http://review.gluster.org/#/c/10811/ > > > > > > There are arguably other ways to fix/overcome the same, this seemed apt > > > for this test case though. > > > > > > > > > > >> > > > >> 6) Note on dd with conv=fdatasync > > > >> As one of the fixes attempts to overcome this issue with the addition > > > >> of > > > >> "conv=fdatasync", wanted to cover that behavior here. > > > >> > > > >> What the above parameter does is to send an NFS_COMMIT (which > > > >> internally > > > >> becomes a flush FOP) at the end of writing the blocks to the NFS > > > >> share. > > > >> This commit as a result triggers any pending writes for this file and > > > >> sends the flush to the brick, all of which succeeds at times, > > > >> resulting > > > >> in the failure of the test case. > > > >> > > > >> NOTE: In the TC ./tests/bugs/quota/bug-1038598.t the failed line is > > > >> pretty much in the same context (LINE 26: TEST ! dd if=/dev/zero > > > >> of=$M0/test_dir/file1.txt bs=1024k count=15 (expecting hard limit to > > > >> be > > > >> exceeded and there are no write failures in the logs (which should be > > > >> expected with EDQUOT (122))). > > > > _______________________________________________ > > > > Gluster-devel mailing list > > > > Gluster-devel@xxxxxxxxxxx > > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > _______________________________________________ > > > Gluster-devel mailing list > > > Gluster-devel@xxxxxxxxxxx > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxxx > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel