Re: [PATCH,RFC] ext4: add lazytime mount option

Dmitry Monakhov <dmonakhov@xxxxxxxxxx> · Fri, 14 Nov 2014 14:34:34 +0300

Theodore Ts'o <tytso@xxxxxxx> writes:

> On Wed, Nov 12, 2014 at 04:47:42PM +0300, Dmitry Monakhov wrote:
>> Also sync mtime updates is a great pain for AIO submitter
>> because AIO submission may be blocked for a seconds (up to 5 second in my case)
>> if inode is part of current committing transaction see: do_get_write_access
>
> 5 seconds?!?  So you're seeing cases where the jbd2 layer is taking
> that long to close a commit?  It might be worth looking at that so we
> can understand why that is happening, and to see if there's anything
> we might do to improve things on that front.  Even if we can get rid
> of most of the mtime updates, there will be other cases where a commit
> that takes a long time to complete will cause all sorts of other very
> nasty latencies on the entire system.
Our chunk server workload is quite generic
submit_task: performs aio-dio requests in to multiple chunk files from
             several threads, this task should not block for too long.
sync_task: performs fsync/fdatasync on demand for modified chunk files before
           we can ACK write-op to user, this task may block

Here is chunk server simulation load:
#TEST_CASE assumes that target fs is mounted to /mnt
# Performs random  aio-dio write  bsz:64k to preallocated files (size:128M) threads:32
# and performs fdatasync each 32'th write operation
$ fio ./aio-dio.fio
# Measure AIO-DIO write submission latency 
$ dd if=/dev/zero of=/mnt/f bs=1M count=1
$ ioping -A  -C -D  -WWW /mnt/f
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=1 time=410 us
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=2 time=430 us
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=3 time=370 us
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=4 time=400 us
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=5 time=1.9 s
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=6 time=4.2 s 
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=7 time=3.8 s
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=8 time=3.7 s
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=9 time=4.1 s
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=10 time=1.9 s
>
>> Yeah we also has ticket for that :)
>> https://jira.sw.ru/browse/PSBM-20411
>
> Is this supposed to be a URL to publically visible web page?
>
> 	Host jira.sw.ru not found: 3(NXDOMAIN)
Ohh, unfortunetly this host is not visiable from outside.
>
>> > +	if (flags & S_VERSION)
>> > +		inode_inc_iversion(inode);
> 	  ....
>> Since we want update all in-memory data we also have to explicitly update inode->i_version
>> Which was previously updated implicitly here:
>> mark_inode_dirty_sync()
>> ->__mark_inode_dirty
>>   ->ext4_dirty_inode
>>     ->ext4_mark_inode_dirty
>>       ->ext4_mark_iloc_dirty
>>         ->inode_inc_iversion(inode);
>
> It's not necessary to add a anothre call to inode_inc_version() since
> we already incremented the i_version if S_VERSION is set, and
> S_VERSIOn gets set when it's necessary to handle incrementing
> i_Version.
>
> The inode_inc_iversion() in mark4_ext4_iloc_dirty() is probably not
> necessary, since we already should be incrementing i_version whenever
> ctime and mtime gets updated.  The inode_inc_iversion() there is more
> of a "belt and suspenders" safety thing, on the theory that the extra
> bump in i_version won't hurt anything.
>
> Cheers,
>
> 					- Ted
Attachment:
signature.asc

Description: PGP signature