Re: Moratorium on new patch acceptance

Vijaikumar M <vmallika@xxxxxxxxxx> · Thu, 21 May 2015 13:34:14 +0530

On Tuesday 19 May 2015 09:50 PM, Shyam wrote:
On 05/19/2015 11:23 AM, Vijaikumar M wrote:

On Tuesday 19 May 2015 08:36 PM, Shyam wrote:
On 05/19/2015 08:10 AM, Raghavendra G wrote:
After discussion with Vijaykumar mallikarjuna and other inputs in this
thread, we are proposing all quota tests to comply to following
criteria:

* use dd always with oflag=append (to make sure there are no parallel
writes) and conv=fdatasync (to make sure errors, if any are 
delivered to
application. Turning off flush-behind is optional since fdatasync acts
as a barrier)

OR

* turn off write-behind in nfs client and glusterfs server.

What do you people think is a better test scenario?

Also, we don't have confirmation on the RCA that parallel writes are
indeed the culprits. We are trying to reproduce the issue locally.
@Shyam, it would be helpful if you can confirm the hypothesis :).

Ummm... I thought we acknowledge that quota checks are done during the
WIND and updated during UNWIND, and we have io threads doing in flight
IOs (as well as possible IOs in io threads queue) and we have 256K
writes in the case mentioned. Put together, in my head this forms a
good RCA that we write more than needed due to the in flight IOs on
the brick. We need to control the in flight IOs as a resolution for
this from the application.

In terms of actual proof, we would need to instrument the code and
check. When you say it does not fail for you, does the file stop once
quota is reached or is a random size greater than quota? Which itself
may explain or point to the RCA.

The basic thing needed from an application is,
- Sync IOs, so that there aren't too many in flight IOs and the
application waits for each IO to complete
- Based on tests below if we keep block size in dd lower and use
oflag=sync we can achieve the same, if we use higher block sizes we
cannot

Test results:
1) noac:
  - NFS sends a COMMIT (internally translates to a flush) post each IO
request (NFS WRITES are still with the UNSTABLE flag)
  - Ensures prior IO is complete before next IO request is sent (due
to waiting on the COMMIT)
  - Fails if IO size is large, i.e in the test case being discussed I
changed the dd line that was failing as "TEST ! dd if=/dev/zero
of=$N0/$mydir/newfile_2 *bs=10M* count=1 conv=fdatasync" and this
fails at times, as the writes here are sent as 256k chunks to the
server and we still see the same behavior
  - noac + performance.nfs.flush-behind: off +
performance.flush-behind: off + performance.nfs.strict-write-ordering:
on + performance.strict-write-ordering: on +
performance.nfs.write-behind: off + performance.write-behind: off
    - Still see similar failures, i.e at times 10MB file is created
successfully in the modified dd command above

Overall, the switch works, but not always. If we are to use this
variant then we need to announce that all quota tests using dd not try
to go beyond the quota limit set in a single IO from dd.

2) oflag=sync:
  - Exactly the same behavior as above.

3) Added all (and possibly the kitches sink) to the test case, as
attached, and still see failures,
  - Yes, I have made the test fail intentionally (of sorts) by using
3M per dd IO and 2 IOs to go beyond the quota limit.
  - The intention is to demonstrate that we still get parallel IOs
from NFS client
  - The test would work if we reduce the block size per IO (reliably
is a border condition here, and we need specific rules like block size
and how many blocks before we state quota is exceeded etc.)
  - The test would work if we just go beyond the quota, and then check
a separate dd instance as being able to *not* exceed the quota. Which
is why I put up that patch.

What next?

Hi Shyam,

I tried running the test with dd option 'oflag=append' and didn't see
the issue.Can you please try this option and see if it works?

Did that (in the attached script that I sent) and it still failed.

Please note:
- This dd command passes (or fails with EDQUOT)
  - dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=512 count=10240 
oflag=append oflag=sync conv=fdatasync
  - We can even drop append and fdatasync, as sync sends a commit per 
block written which is better for the test and quota enforcement, 
whereas fdatasync does one in the end and sometimes fails (with larger 
block sizes, say 1M)
  - We can change bs to [512 - 256k]

- This dd command fails (or writes all the data)
  - dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=3M count=2 oflag=append 
oflag=sync conv=fdatasync

The reasoning is that when we write a larger block size, NFS sends in 
multiple 256k chunks to write and then sends the commit before the 
next block. As a result if we exceed quota in the *last block* that we 
are writing, we *may* fail. If we exceed quota in the last but one 
block we will pass.

Hope this shorter version explains it better.

(VijayM is educating me on quota (over IM), and it looks like the 
quota update happens as a synctask in the background, so post the 
flush (NFS commit) we may still have a race)

Post education solution:
- Quota updates on disk xattr as a sync task, as a result if we 
exceeded quota in the n-1th block there is no guarantee that the nth 
block would fail, as the sync task may not have completed

So I think we need to do the following for the quota based tests 
(expanding on the provided patch, http://review.gluster.org/#/c/10811/ )
- First dd that exceeds quota (with either oflag=sync or 
conv=fdatasync so that we do not see any flush behind or write behind 
effects) to be done without checks
- Next check in an EXPECT_WITHIN that quota is exceeded (maybe add 
checks on the just created/appended file w.r.t its minimum size that 
would make it exceed the quota)
- Then do a further dd to a new file or append to an existing file to 
get the EDQUOT error
- Proceed with whatever the test case needs to do next

Suggestions?

Here is my analysis on spurious failure with testcase: 
tests/bugs/distribute/bug-1161156.t
In release-3.7, marker is re-factored to use synctask to do background 
accounting.
I have done below tests with different combination and found that 
parallel writes is causing the spurious failure.
I have filed a bug# 1223658 to track parallel write issue with quota.

1) Parallel writes and Marker background update (Test always fails)
    TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=3M count=2 
conv=fdatasync oflag=sync oflag=append

    NFS client breaks 3M writes into multiple 256k chunks and does 
parallel writes

2) Parallel writes and Marker foreground update (Test always fails)
    TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=3M count=2 
conv=fdatasync oflag=sync oflag=append

    Made a marker code change to account quota in foreground (without 
synctask)

3) Serial writes and Marker background update (Test passed 100/100 times)
    TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=256k count=24 
conv=fdatasync oflag=sync oflag=append

    Using smaller block size (256k), so that NFS client reduces 
parallel writes

4) Serial writes and Marker foreground update (Test passed 100/100 times)
    TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=256k count=24 
conv=fdatasync oflag=sync oflag=append

    Using smaller block size (256k), so that NFS client reduces 
parallel writes
    Made a marker code change to account quota in foreground (without 
synctask)

5) Parallel writes on release-3.6 (Test always fails)
    TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=3M count=2 
conv=fdatasync oflag=sync oflag=append
    Moved marker xlator above IO-Threads in the graph.

Thanks,
Vijay

Thanks,
Vijay

regards,
Raghavendra.

On Tue, May 19, 2015 at 5:27 PM, Raghavendra G 
<raghavendra@xxxxxxxxxxx
<mailto:raghavendra@xxxxxxxxxxx>> wrote:

    On Tue, May 19, 2015 at 4:26 PM, Jeff Darcy <jdarcy@xxxxxxxxxx
<mailto:jdarcy@xxxxxxxxxx>> wrote:

        > No, my suggestion was aimed at not having parallel writes.
In this case quota
        > won't even fail the writes with EDQUOT because of reasons
explained above.
        > Yes, we need to disable flush-behind along with this so
that errors are
        > delivered to application.

        Would conv=sync help here?  That should prevent any kind of
        write parallelism.

    An strace of dd shows that

    * fdatasync is issued only once at the end of all writes when
    conv=fdatasync
    * for some strange reason no fsync or fdatasync is issued at all
    when conv=sync

    So, using conv=fdatasync in the test cannot prevent
    write-parallelism induced by write-behind. Parallelism would've 
been
    prevented only if dd had issued fdatasync after each write or 
opened
    the file with O_SYNC.

        If it doesn't, I'd say that's a true test failure somewhere in
        our stack.  A
        similar possibility would be to invoke dd multiple times with
        oflag=append.

    Yes, appending writes curb parallelism (at least in glusterfs, but
    not sure how nfs client behaves) and hence can be used as an
    alternative solution.

    On a slightly unrelated note flush-behind is immaterial in this 
test
    since fdatasync is anyways acting as a barrier.

        _______________________________________________
        Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
http://www.gluster.org/mailman/listinfo/gluster-devel

    --
    Raghavendra G

--
Raghavendra G

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel