On 05/19/2015 11:46 AM, Raghavendra Gowdappa wrote:
----- Original Message -----
From: "Shyam" <srangana@xxxxxxxxxx>
To: gluster-devel@xxxxxxxxxxx
Sent: Tuesday, May 19, 2015 6:13:06 AM
Subject: Re: Moratorium on new patch acceptance
On 05/18/2015 07:05 PM, Shyam wrote:
On 05/18/2015 03:49 PM, Shyam wrote:
On 05/18/2015 10:33 AM, Vijay Bellur wrote:
The etherpad did not call out, ./tests/bugs/distribute/bug-1161156.t
which did not have an owner, and so I took a stab at it and below are
the results.
I also think failure in ./tests/bugs/quota/bug-1038598.t is the same as
the observation below.
NOTE: Anyone with better knowledge of Quota can possibly chip in as to
what should we expect in this case and how to correct the expectation
from these test cases.
(Details of ./tests/bugs/distribute/bug-1161156.t)
1) Failure is in TEST #20
Failed line: TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=1k
count=10240 conv=fdatasync
2) The above line is expected to fail (i.e dd is expected to fail) as,
the set quota is 20MB and we are attempting to exceed it by another 5MB
at this point in the test case.
3) The failure is easily reproducible in my laptop, 2/10 times
4) On debugging, I see that when the above dd succeeds (or the test
fails, which means dd succeeded in writing more than the set quota),
there are no write errors from the bricks or any errors on the final
COMMIT RPC call to NFS.
As a result the expectation of this test fails.
NOTE: Sometimes there is a write failure from one of the bricks (the
above test uses AFR as well), but AFR self healing kicks in and fixes
the problem, as expected, as the write succeeded on one of the replicas.
I add this observation, as the failed regression run logs, has some
EDQUOT errors reported in the client xlator, but only from one of the
client bricks, and there are further AFR self heal logs noted in the
logs.
5) When the test case succeeds the writes fail with EDQUOT as expected.
There are times when the quota is exceeded by say 1MB - 4.8MB, but the
test case still passes. Which means that, if we were to try to exceed
the quota by 1MB (instead of the 5MB as in the test case), this test
case may fail always.
Here is why I think this passes by quota sometime and not others making
this and the other test case mentioned below spurious.
- Each write is 256K from the client (that is what is sent over the wire)
- If more IO was queued by io-threads after passing quota checks, which
in this 5MB case requires >20 IOs to be queued (16 IOs could be active
in io-threads itself), we could end up writing more than the quota amount
So, if quota checks to see if a write is violating the quota, and let's
it through, and updates on the UNWIND the space used for future checks,
we could have more IO outstanding than what the quota allows, and as a
result allow such a larger write to pass through, considering IO threads
queue and active IOs as well. Would this be a fair assumption of how
quota works?
Yes, this is a possible scenario. There is a finite time window between,
1. Querying the size of a directory. In other words checking whether current write can be allowed
2. The "effect" of this write getting reflected in size of all the parent directories of a file till root
If 1 and 2 were atomic, another parallel write which could've exceed the quota-limit could not have slipped through. Unfortunately, in the current scheme of things they are not atomic. Now there can be parallel writes in this test case because of nfs-client and/or glusterfs write-back (though we've one single threaded application - dd - running). One way of testing this hypothesis is to disable nfs and glusterfs write-back and run the same (unmodified) test and the test should succeed always (dd should fail). To disable write-back in nfs you can use noac option while mounting.
The situation becomes worse in real-life scenarios because of parallelism involved at many layers:
1. multiple applications, each possibly being multithreaded writing to possibly many/or single file(s) in a quota subtree
2. write-back in NFS-client and glusterfs
3. Multiple bricks holding files of a quota-subtree. Each brick processing simultaneously many write requests through io-threads.
I've tried in past to fix the issue, though unsuccessfully. It seems to me that one effective strategy is to make enforcement and updation of size of parents atomic. But if we do that we end up adding latency of accounting to latency of fop. Other options can be explored. But, our Quota functionality requirements allow a buffer of 10% while enforcing limits. So, this issue has not been high on our priority list till now. So, our tests should also expect failures allowing for this 10% buffer.
I think we can provide a knob to provide atomicity and more consistency
in the future. Certain use cases could benefit by turning on this knob.
Depending on the configured quota limit and the incoming rate of writes,
we can easily overshoot 10%. We could use a more appropriate margin in
our tests.
-Vijay
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel