Re: Moratorium on new patch acceptance

Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> · Tue, 19 May 2015 02:51:54 -0400 (EDT)

----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> To: "Shyam" <srangana@xxxxxxxxxx>
> Cc: gluster-devel@xxxxxxxxxxx
> Sent: Tuesday, May 19, 2015 11:49:56 AM
> Subject: Re:  Moratorium on new patch acceptance
> 
> 
> 
> ----- Original Message -----
> > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> > To: "Shyam" <srangana@xxxxxxxxxx>
> > Cc: gluster-devel@xxxxxxxxxxx
> > Sent: Tuesday, May 19, 2015 11:46:19 AM
> > Subject: Re:  Moratorium on new patch acceptance
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Shyam" <srangana@xxxxxxxxxx>
> > > To: gluster-devel@xxxxxxxxxxx
> > > Sent: Tuesday, May 19, 2015 6:13:06 AM
> > > Subject: Re:  Moratorium on new patch acceptance
> > > 
> > > On 05/18/2015 07:05 PM, Shyam wrote:
> > > > On 05/18/2015 03:49 PM, Shyam wrote:
> > > >> On 05/18/2015 10:33 AM, Vijay Bellur wrote:
> > > >>
> > > >> The etherpad did not call out, ./tests/bugs/distribute/bug-1161156.t
> > > >> which did not have an owner, and so I took a stab at it and below are
> > > >> the results.
> > > >>
> > > >> I also think failure in ./tests/bugs/quota/bug-1038598.t is the same
> > > >> as
> > > >> the observation below.
> > > >>
> > > >> NOTE: Anyone with better knowledge of Quota can possibly chip in as to
> > > >> what should we expect in this case and how to correct the expectation
> > > >> from these test cases.
> > > >>
> > > >> (Details of ./tests/bugs/distribute/bug-1161156.t)
> > > >> 1) Failure is in TEST #20
> > > >>     Failed line: TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=1k
> > > >> count=10240 conv=fdatasync
> > > >>
> > > >> 2) The above line is expected to fail (i.e dd is expected to fail) as,
> > > >> the set quota is 20MB and we are attempting to exceed it by another
> > > >> 5MB
> > > >> at this point in the test case.
> > > >>
> > > >> 3) The failure is easily reproducible in my laptop, 2/10 times
> > > >>
> > > >> 4) On debugging, I see that when the above dd succeeds (or the test
> > > >> fails, which means dd succeeded in writing more than the set quota),
> > > >> there are no write errors from the bricks or any errors on the final
> > > >> COMMIT RPC call to NFS.
> > > >>
> > > >> As a result the expectation of this test fails.
> > > >>
> > > >> NOTE: Sometimes there is a write failure from one of the bricks (the
> > > >> above test uses AFR as well), but AFR self healing kicks in and fixes
> > > >> the problem, as expected, as the write succeeded on one of the
> > > >> replicas.
> > > >> I add this observation, as the failed regression run logs, has some
> > > >> EDQUOT errors reported in the client xlator, but only from one of the
> > > >> client bricks, and there are further AFR self heal logs noted in the
> > > >> logs.
> > > >>
> > > >> 5) When the test case succeeds the writes fail with EDQUOT as
> > > >> expected.
> > > >> There are times when the quota is exceeded by say 1MB - 4.8MB, but the
> > > >> test case still passes. Which means that, if we were to try to exceed
> > > >> the quota by 1MB (instead of the 5MB as in the test case), this test
> > > >> case may fail always.
> > > >
> > > > Here is why I think this passes by quota sometime and not others making
> > > > this and the other test case mentioned below spurious.
> > > > - Each write is 256K from the client (that is what is sent over the
> > > > wire)
> > > > - If more IO was queued by io-threads after passing quota checks, which
> > > > in this 5MB case requires >20 IOs to be queued (16 IOs could be active
> > > > in io-threads itself), we could end up writing more than the quota
> > > > amount
> > > >
> > > > So, if quota checks to see if a write is violating the quota, and let's
> > > > it through, and updates on the UNWIND the space used for future checks,
> > > > we could have more IO outstanding than what the quota allows, and as a
> > > > result allow such a larger write to pass through, considering IO
> > > > threads
> > > > queue and active IOs as well. Would this be a fair assumption of how
> > > > quota works?
> > 
> > Yes, this is a possible scenario. There is a finite time window between,
> > 
> > 1. Querying the size of a directory. In other words checking whether
> > current
> > write can be allowed
> > 2. The "effect" of this write getting reflected in size of all the parent
> > directories of a file till root
> > 
> > If 1 and 2 were atomic, another parallel write which could've exceed the
> > quota-limit could not have slipped through. Unfortunately, in the current
> > scheme of things they are not atomic. Now there can be parallel writes in
> > this test case because of nfs-client and/or glusterfs write-back (though
> > we've one single threaded application - dd - running). One way of testing
> > this hypothesis is to disable nfs and glusterfs write-back and run the same
> > (unmodified) test and the test should succeed always (dd should fail). To
> > disable write-back in nfs you can use noac option while mounting.
> > 
> > The situation becomes worse in real-life scenarios because of parallelism
> > involved at many layers:
> > 
> > 1. multiple applications, each possibly being multithreaded writing to
> > possibly many/or single file(s) in a quota subtree
> > 2. write-back in NFS-client and glusterfs
> > 3. Multiple bricks holding files of a quota-subtree. Each brick processing
> > simultaneously many write requests through io-threads.
> 
> 4. Background accounting of directory sizes _after_ a write is complete.
> 
> > 
> > I've tried in past to fix the issue, though unsuccessfully. It seems to me
> > that one effective strategy is to make enforcement and updation of size of
> > parents atomic. But if we do that we end up adding latency of accounting to
> > latency of fop. Other options can be explored. But, our Quota functionality
> > requirements allow a buffer of 10% while enforcing limits. So, this issue
> > has not been high on our priority list till now. So, our tests should also
> > expect failures allowing for this 10% buffer.

Since most of our tests are a single instance of single threaded dd running on a single mount, if the hypothesis turns out true, we can turn off nfs-client and glusterfs write-back in all tests related to Quota. Comments?

> > 
> > > >
> > > > I believe this is what is happening in this case. Checking a fix on my
> > > > machine, and will post the same if it proves to be help the situation.
> > > 
> > > Posted a patch to fix the problem: http://review.gluster.org/#/c/10811/
> > > 
> > > There are arguably other ways to fix/overcome the same, this seemed apt
> > > for this test case though.
> > > 
> > > >
> > > >>
> > > >> 6) Note on dd with conv=fdatasync
> > > >> As one of the fixes attempts to overcome this issue with the addition
> > > >> of
> > > >> "conv=fdatasync", wanted to cover that behavior here.
> > > >>
> > > >> What the above parameter does is to send an NFS_COMMIT (which
> > > >> internally
> > > >> becomes a flush FOP) at the end of writing the blocks to the NFS
> > > >> share.
> > > >> This commit as a result triggers any pending writes for this file and
> > > >> sends the flush to the brick, all of which succeeds at times,
> > > >> resulting
> > > >> in the failure of the test case.
> > > >>
> > > >> NOTE: In the TC ./tests/bugs/quota/bug-1038598.t the failed line is
> > > >> pretty much in the same context (LINE 26: TEST ! dd if=/dev/zero
> > > >> of=$M0/test_dir/file1.txt bs=1024k count=15 (expecting hard limit to
> > > >> be
> > > >> exceeded and there are no write failures in the logs (which should be
> > > >> expected with EDQUOT (122))).
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel@xxxxxxxxxxx
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel@xxxxxxxxxxx
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel@xxxxxxxxxxx
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel