Re: Moratorium on new patch acceptance

Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> · Tue, 19 May 2015 02:19:56 -0400 (EDT)

----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> To: "Shyam" <srangana@xxxxxxxxxx>
> Cc: gluster-devel@xxxxxxxxxxx
> Sent: Tuesday, May 19, 2015 11:46:19 AM
> Subject: Re:  Moratorium on new patch acceptance
> 
> 
> 
> ----- Original Message -----
> > From: "Shyam" <srangana@xxxxxxxxxx>
> > To: gluster-devel@xxxxxxxxxxx
> > Sent: Tuesday, May 19, 2015 6:13:06 AM
> > Subject: Re:  Moratorium on new patch acceptance
> > 
> > On 05/18/2015 07:05 PM, Shyam wrote:
> > > On 05/18/2015 03:49 PM, Shyam wrote:
> > >> On 05/18/2015 10:33 AM, Vijay Bellur wrote:
> > >>
> > >> The etherpad did not call out, ./tests/bugs/distribute/bug-1161156.t
> > >> which did not have an owner, and so I took a stab at it and below are
> > >> the results.
> > >>
> > >> I also think failure in ./tests/bugs/quota/bug-1038598.t is the same as
> > >> the observation below.
> > >>
> > >> NOTE: Anyone with better knowledge of Quota can possibly chip in as to
> > >> what should we expect in this case and how to correct the expectation
> > >> from these test cases.
> > >>
> > >> (Details of ./tests/bugs/distribute/bug-1161156.t)
> > >> 1) Failure is in TEST #20
> > >>     Failed line: TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=1k
> > >> count=10240 conv=fdatasync
> > >>
> > >> 2) The above line is expected to fail (i.e dd is expected to fail) as,
> > >> the set quota is 20MB and we are attempting to exceed it by another 5MB
> > >> at this point in the test case.
> > >>
> > >> 3) The failure is easily reproducible in my laptop, 2/10 times
> > >>
> > >> 4) On debugging, I see that when the above dd succeeds (or the test
> > >> fails, which means dd succeeded in writing more than the set quota),
> > >> there are no write errors from the bricks or any errors on the final
> > >> COMMIT RPC call to NFS.
> > >>
> > >> As a result the expectation of this test fails.
> > >>
> > >> NOTE: Sometimes there is a write failure from one of the bricks (the
> > >> above test uses AFR as well), but AFR self healing kicks in and fixes
> > >> the problem, as expected, as the write succeeded on one of the replicas.
> > >> I add this observation, as the failed regression run logs, has some
> > >> EDQUOT errors reported in the client xlator, but only from one of the
> > >> client bricks, and there are further AFR self heal logs noted in the
> > >> logs.
> > >>
> > >> 5) When the test case succeeds the writes fail with EDQUOT as expected.
> > >> There are times when the quota is exceeded by say 1MB - 4.8MB, but the
> > >> test case still passes. Which means that, if we were to try to exceed
> > >> the quota by 1MB (instead of the 5MB as in the test case), this test
> > >> case may fail always.
> > >
> > > Here is why I think this passes by quota sometime and not others making
> > > this and the other test case mentioned below spurious.
> > > - Each write is 256K from the client (that is what is sent over the wire)
> > > - If more IO was queued by io-threads after passing quota checks, which
> > > in this 5MB case requires >20 IOs to be queued (16 IOs could be active
> > > in io-threads itself), we could end up writing more than the quota amount
> > >
> > > So, if quota checks to see if a write is violating the quota, and let's
> > > it through, and updates on the UNWIND the space used for future checks,
> > > we could have more IO outstanding than what the quota allows, and as a
> > > result allow such a larger write to pass through, considering IO threads
> > > queue and active IOs as well. Would this be a fair assumption of how
> > > quota works?
> 
> Yes, this is a possible scenario. There is a finite time window between,
> 
> 1. Querying the size of a directory. In other words checking whether current
> write can be allowed
> 2. The "effect" of this write getting reflected in size of all the parent
> directories of a file till root
> 
> If 1 and 2 were atomic, another parallel write which could've exceed the
> quota-limit could not have slipped through. Unfortunately, in the current
> scheme of things they are not atomic. Now there can be parallel writes in
> this test case because of nfs-client and/or glusterfs write-back (though
> we've one single threaded application - dd - running). One way of testing
> this hypothesis is to disable nfs and glusterfs write-back and run the same
> (unmodified) test and the test should succeed always (dd should fail). To
> disable write-back in nfs you can use noac option while mounting.
> 
> The situation becomes worse in real-life scenarios because of parallelism
> involved at many layers:
> 
> 1. multiple applications, each possibly being multithreaded writing to
> possibly many/or single file(s) in a quota subtree
> 2. write-back in NFS-client and glusterfs
> 3. Multiple bricks holding files of a quota-subtree. Each brick processing
> simultaneously many write requests through io-threads.

4. Background accounting of directory sizes _after_ a write is complete.

> 
> I've tried in past to fix the issue, though unsuccessfully. It seems to me
> that one effective strategy is to make enforcement and updation of size of
> parents atomic. But if we do that we end up adding latency of accounting to
> latency of fop. Other options can be explored. But, our Quota functionality
> requirements allow a buffer of 10% while enforcing limits. So, this issue
> has not been high on our priority list till now. So, our tests should also
> expect failures allowing for this 10% buffer.
> 
> > >
> > > I believe this is what is happening in this case. Checking a fix on my
> > > machine, and will post the same if it proves to be help the situation.
> > 
> > Posted a patch to fix the problem: http://review.gluster.org/#/c/10811/
> > 
> > There are arguably other ways to fix/overcome the same, this seemed apt
> > for this test case though.
> > 
> > >
> > >>
> > >> 6) Note on dd with conv=fdatasync
> > >> As one of the fixes attempts to overcome this issue with the addition of
> > >> "conv=fdatasync", wanted to cover that behavior here.
> > >>
> > >> What the above parameter does is to send an NFS_COMMIT (which internally
> > >> becomes a flush FOP) at the end of writing the blocks to the NFS share.
> > >> This commit as a result triggers any pending writes for this file and
> > >> sends the flush to the brick, all of which succeeds at times, resulting
> > >> in the failure of the test case.
> > >>
> > >> NOTE: In the TC ./tests/bugs/quota/bug-1038598.t the failed line is
> > >> pretty much in the same context (LINE 26: TEST ! dd if=/dev/zero
> > >> of=$M0/test_dir/file1.txt bs=1024k count=15 (expecting hard limit to be
> > >> exceeded and there are no write failures in the logs (which should be
> > >> expected with EDQUOT (122))).
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel@xxxxxxxxxxx
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel@xxxxxxxxxxx
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel