Re: [RFC PATCH 2/2] ceph: test basic ceph.quota.max_bytes quota

Luis Henriques <lhenriques@xxxxxxxx> · Wed, 03 Apr 2019 14:19:11 +0100

Nikolay Borisov <nborisov@xxxxxxxx> writes:

> On 3.04.19 г. 12:45 ч., Luis Henriques wrote:
>> Dave Chinner <david@xxxxxxxxxxxxx> writes:
>> 
>>> On Tue, Apr 02, 2019 at 11:34:28AM +0100, Luis Henriques wrote:
>>>> Simple set of checks for CephFS max_bytes directory quota implementation.
>>>>
>>>> Signed-off-by: Luis Henriques <lhenriques@xxxxxxxx>
>>>> ---
>>>>  tests/ceph/002     | 147 +++++++++++++++++++++++++++++++++++++++++++++
>>>>  tests/ceph/002.out |   1 +
>>>>  tests/ceph/group   |   1 +
>>>>  3 files changed, 149 insertions(+)
>>>>  create mode 100755 tests/ceph/002
>>>>  create mode 100644 tests/ceph/002.out
>>>>
>>>> diff --git a/tests/ceph/002 b/tests/ceph/002
>>>> new file mode 100755
>>>> index 000000000000..313865dc639e
>>>> --- /dev/null
>>>> +++ b/tests/ceph/002
>>>> @@ -0,0 +1,147 @@
>>>> +#! /bin/bash
>>>> +# SPDX-License-Identifier: GPL-2.0
>>>> +# Copyright (c) 2019 SUSE LLC. All Rights Reserved.
>>>> +#
>>>> +# FS QA Test No. 002
>>>> +#
>>>> +# This tests basic ceph.quota.max_bytes quota features.
>>>> +#
>>>> +
>>>> +seq=`basename $0`
>>>> +seqres=$RESULT_DIR/$seq
>>>> +echo "QA output created by $seq"
>>>> +
>>>> +testdir=$TEST_DIR/quota-test
>>>
>>> Try not to name local variables the same as when known global
>>> variables. When we talk about "test dir", we mean the mount point
>>> for the test device, not the local, tests specific work directory.
>>>  i.e. this is a "work dir", not a "test dir".
>>>
>>> And, often, we just name them after the test that is running,
>>> so we can identify what test left them behind. i.e.
>>>
>>> workdir=$TEST_DIR/$seq
>>>
>>>> +
>>>> +tmp=/tmp/$$
>>>> +status=1    # failure is the default!
>>>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>>>> +
>>>> +_cleanup()
>>>> +{
>>>> +	cd /
>>>> +	rm -rf $tmp.*
>>>> +	rm -rf $testdir
>>>
>>> Leave it behind for post-mortem analysis if necessary, remove it
>>> before starting this test execution....
>>>
>>>> +}
>>>> +
>>>> +# get standard environment, filters and checks
>>>> +. ./common/rc
>>>> +. ./common/filter
>>>> +. ./common/attr
>>>> +
>>>> +# real QA test starts here
>>>> +_supported_os Linux
>>>> +_supported_fs ceph
>>>> +
>>>> +_require_attrs
>>>> +
>>>> +set_quota()
>>>> +{
>>>> +	val=$1
>>>> +	dir=$2
>>>> +	$SETFATTR_PROG -n ceph.quota.max_bytes -v $val $dir >/dev/null 2>&1
>>>> +}
>>>> +
>>>> +get_quota()
>>>> +{
>>>> +	dir=$1
>>>> +	$GETFATTR_PROG --only-values -n ceph.quota.max_bytes $dir 2> /dev/null
>>>> +}
>>>> +
>>>> +# function to write a file.  We use a loop because quotas in CephFS is a
>>>> +# "best-effort" implementation, i.e. a write may actually be allowed even if the
>>>> +# quota is being exceeded.  Using a loop reduces the chances of this to happen.
>>>> +#
>>>> +# NOTE: 'size' parameter is in M
>>>
>>> xfs_io accepts "1m" as one megabyte.
>>>
>>>> +write_file()
>>>> +{
>>>> +	file=$1
>>>> +	size=$2 # size in M
>>>> +	for (( i = 1; i < $size; i++ )); do
>>>> +		$XFS_IO_PROG -f -c "pwrite -W $((i * 1048576)) 1048576" \
>>>> +			     $file >/dev/null 2>&1
>>>> +	done
>>>> +}
>>>
>>> Makes no sense to me. xfs_io does a write() loop internally with
>>> this pwrite command of 4kB writes - the default buffer size. If you
>>> want xfs_io to loop doing 1MB sized pwrite() calls, then all you
>>> need is this:
>>>
>>> 	$XFS_IO_PROG -f -c "pwrite -w -B 1m 0 ${size}m" $file | _filter_xfs_io
>>>
>> 
>> Thank you for your review, Dave.  I'll make sure the next revision of
>> these tests will include all your comments implemented... except for
>> this one.
>> 
>> The reason I'm using a loop for writing a file is due to the nature of
>> the (very!) loose definition of quotas in CephFS.  Basically, clients
>> will likely write some amount of data over the configured limit because
>> the servers they are communicating with to write the data (the OSDs)
>> have no idea about the concept of quotas (or files even); the filesystem
>> view in the cluster is managed at a different level, with the help of
>> the MDS and the client itself.
>> 
>> So, the loop in this function is simply to allow the metadata associated
>> with the file to be updated while we're writing the file.  If I use a
>
> But the metadata will be modified while writing the file even with a
> single invocation of xfs_io.

No, that's not true.  It would be too expensive to keep the metadata
server updated while writing to a file.  So, making sure there's
actually an open/close to the file (plus the fsync in pwrite) helps
making sure the metadata is flushed into the MDS.

(And yes I _did_ tried to use xfs_io with the 1m loop and the test fails
because it simply writes all the data and gets no EDQUOT error.)

Cheers,
-- 
Luis

> It's just that you are moving the loop inside xfs_io rather than
> having to invoke xfs_io a lot of time. Also, just because you are
> using a single -c "pwrite" command doesn't mean this will translate to
> a single call to pwrite. As Dave mentioned, the default block size is
> 4k meaning :
>
> "pwrite -w -B 1m 0 ${size}m"
>
> will result in 'size / 1m' writes of size 1m, each being a distinct call
> to pwrite.
>
>> single pwrite, the whole file will be written before we get a -EDQUOT.
>> 
>> If an admin wants to really enforce some hard quotas in the filesystem,
>> there are other means to do that, but not at the filesystem level.
>> There are some more details on the quota implementation in Ceph here:
>> 
>>   http://docs.ceph.com/docs/master/cephfs/quota/
>> 
>> I hope this makes sense and helps understanding why I need a loop to be
>> used in this test.
>> 
>> Cheers
>>