> > 在 2017年12月7日,下午4:17,Amir Goldstein <amir73il@xxxxxxxxx> 写道: > > On Thu, Dec 7, 2017 at 9:42 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote: >> 在 2017年12月7日,下午3:13,Eryu Guan <eguan@xxxxxxxxxx> 写道: >>> >>> On Thu, Dec 07, 2017 at 02:20:26PM +0800, Chengguang Xu wrote: >>>>> >>>>> 在 2017年12月7日,下午1:44,Eryu Guan <eguan@xxxxxxxxxx> 写道: >>>>> >>>>> On Thu, Dec 07, 2017 at 10:22:07AM +0800, Chengguang Xu wrote: >>>>>> Inspired by syncfs bug of overlayfs which does not sync dirtyinodes in >>>>>> underlying filesystem. >>>>>> Run syncfs and shutdown filesystem(or underlying filesystem of overlayfs) >>>>>> to check syncfs result. >>>>>> >>>>>> Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxx> >>>>>> --- >>>>>> >>>>>> Changes since v1: >>>>>> Use fs shutdown and fssum to check syncfs result instead of >>>>>> checking delalloc state of extents. >>>>>> >>>>>> tests/generic/470 | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> tests/generic/470.out | 2 ++ >>>>>> tests/generic/group | 1 + >>>>>> 3 files changed, 91 insertions(+) >>>>>> create mode 100755 tests/generic/470 >>>>>> create mode 100644 tests/generic/470.out >>>>>> >>>>>> diff --git a/tests/generic/470 b/tests/generic/470 >>>>>> new file mode 100755 >>>>>> index 0000000..b488747 >>>>>> --- /dev/null >>>>>> +++ b/tests/generic/470 >>>>>> @@ -0,0 +1,88 @@ >>>>>> +#! /bin/bash >>>>>> +# FS QA Test 470 >>>>>> +# >>>>>> +# Inspired by syncfs bug of overlayfs which does not sync dirty inodes in >>>>>> +# underlying filesystem. >>>>> >>>>> Trailing whitespace in above line. >>>>> >>>>>> +# >>>>>> +# Run syncfs and shutdown filesystem(or underlying filesystem of overlayfs) >>>>>> +# to check syncfs result. >>>>>> +# >>>>>> +# Test will be skipped if filesystem(or underlying filesystem of overlayfs) >>>>>> +# does not support shutdown. >>>>>> +# >>>>>> +#----------------------------------------------------------------------- >>>>>> +# Copyright (c) 2017 Chengguang Xu <cgxu519@xxxxxxxxxx> >>>>>> +# All Rights Reserved. >>>>>> +# >>>>>> +# This program is free software; you can redistribute it and/or >>>>>> +# modify it under the terms of the GNU General Public License as >>>>>> +# published by the Free Software Foundation. >>>>>> +# >>>>>> +# This program is distributed in the hope that it would be useful, >>>>>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of >>>>>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>>>>> +# GNU General Public License for more details. >>>>>> +# >>>>>> +# You should have received a copy of the GNU General Public License >>>>>> +# along with this program; if not, write the Free Software Foundation, >>>>>> +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA >>>>>> +#----------------------------------------------------------------------- >>>>>> +# >>>>>> + >>>>>> +seq=`basename $0` >>>>>> +seqres=$RESULT_DIR/$seq >>>>>> +echo "QA output created by $seq" >>>>>> + >>>>>> +here=`pwd` >>>>>> +tmp=/tmp/$$ >>>>>> +status=1 >>>>>> +trap "_cleanup; exit \$status" 0 1 2 3 15 >>>>>> + >>>>>> +_cleanup() >>>>>> +{ >>>>>> + cd / >>>>>> + rm -f $tmp.* >>>>>> +} >>>>>> + >>>>>> +# get standard environment, filters and checks >>>>>> +. ./common/rc >>>>>> +. ./common/filter >>>>>> + >>>>>> +# remove previous $seqres.full before test >>>>>> +rm -f $seqres.full >>>>>> + >>>>>> +# real QA test starts here >>>>>> + >>>>>> +_supported_fs generic >>>>>> +_supported_os Linux >>>>>> +_require_test >>>>>> +_require_fssum >>>>>> +_require_scratch >>>>>> +_require_scratch_shutdown >>>>>> +_require_xfs_io_command "syncfs" >>>>>> + >>>>>> + >>>>>> +FCNT=1000 >>>>>> + >>>>>> +_scratch_mkfs >/dev/null 2>&1 >>>>>> +_scratch_mount >>>>>> + >>>>>> +# In order to mitigate interference of write-back, >>>>>> +# create many files for test. >>>>> >>>>> Sorry, I still don't understand how writeback could interfere this test >>>>> from this comment, what happens if we don't create such files? Why >>>>> writing files starting from offset 1k? >>>> >>>> There is no explicit explanation how writeback interferes this case, >>>> also there are many triggers make writeback starts syncing work. >>>> I just want to increase hit ratio of failure by make many test files, >>>> as many as possible, but it’s also limited by time and other resource. >>>> >>>> The reason of offset 1k is same as above, compare to test a normal file, >>>> I think file with hole can increase failure ratio sometimes. >>> >>> Yeah, increasing the reproducibility would be a good reason too. Do you >>> happen to tune the number of files to see if 1000 is a good fit? e.g. >>> with 100 files test reproduced the overlay bug 20% of times, with 1000 >>> files the reproducibility increased to 80%, etc. And the hole in the >>> beginning too, what's the actual impact on the reproducibility? >>> >>> And you're right about the test time, usually we want to balance between >>> test time and reproducibility too, so we need to tune and measure the >>> numbers like test files, loop counts etc. >>> >>> I think these are all good comments for test :) >> >> I didn’t do much accurate testing about reproducibility, and also don’t have >> a plan to do that. Actually in my testing ENV, the BUG is always reproducible >> even for only a few of test files. > > What is the underlying fs you are testing with? I usually use xfs as underlying. > >> 1000 is just my definition for >> quite many files, and it’s really hard and meaningless to guess what number is >> best suit for variety of testing ENVs. So if you think the number 1000 is >> improper number, then I can modify it to right number that you think. Hole is >> the same. >> > > I agree with Eryu that you should not invent numbers, unless test completes > in a few seconds and reproduces reliably - then you can invent numbers... > > But this got me thinking about the details of overlayfs syncfs bug. > Overlayfs syncfs *will* actually call underlying fs syncfs (I fixed that) > but *will not* flush dirty inode data. What that means depends on > the underlying fs. For ext4 with default jounal=ordered, syncfs will > commit uncommitted journal transactions to disk, that will force data writeback > for all inodes, whose *metadata* is modified in uncommitted transactions. > > Ted, please correct me if I am wrong. > > For xfs, situation is a bit different, create operations are also delayed, > so your test could fail on overlayfs over xfs more easily. > > I think that means that if you create the files and write them in the same > transaction, syncfs *will* actually sync on inodes data, so it is anyway > only the very last files that you write that won't be flushed no matter how > many files you will write. > What you should try to do to increase the changes of the bug on more fs: > - create new files and truncate them to final size but leaving them sparse > - sync > - buffered write to all files > - syncfs > - shutdown > - cycle_mount > > The rules for choosing the right amount of files/data should be: > - After first sync, all inodes in the system is not dirty > - During the time it takes to write all files, flusher thread may kick in > (default 30 seconds), flush data of your tests files and interfere with > the test. Lets rule out another user doing sync, because most test > machines are VMs that just run the tests > - If one loop iteration of the test above takes less than X second for a > valid fs on a slow disk, then if flusher thread does interfere, its > interference will be over in less then X second > (because we are the only ones dirtying data on the system). > - If X < 10 seconds, then running 2 or 3 loops of the iteration should > be enough to guaranty that we run at least one iteration without > interference. I did more detail tests for three different data modes of ext4 and found the overlayfs syncfs bug is reproducible on data=ordered and data=writeback, but on data=journal mode, data is flushed and correct. I only wrote only a few words to a single file and the bug is always reproducible on my test environment. For writeback interferences, AFAIK, from dirty ratio and period. If we drop all dirty caches & sync before the test, I think we can avoid interference from it. So if we don’t have anything else to interference test result, I just want to modify to write a small single file as test target. Am I missing anything? > > So I think if you actually write as little files and data as possible, > but loop several times. > IMO, tune the parameters, so an iteration takes ~1 second on slow disk > and run 3 iterations. > I have a test setup with spinning disk I can run your test on xfs/ext4 if > you like more samples then your own test environment. > > And please write comments about those heuristics... > > Thanks, > Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html