Re: [Cluster-devel] GFS2 file system does not invalidate page cache after direct IO write

"Gang He" <ghe@xxxxxxxx> · Thu, 04 May 2017 21:09:56 -0600

Hello Andreas,

>>> 
> Gang,
> 
> On Thu, May 4, 2017 at 5:33 AM, Gang He <ghe@xxxxxxxx> wrote:
>> Hello Guys,
>>
>> I found a interesting thing on GFS2 file system, After I did a direct IO 
> write for a whole file, I still saw there were some page caches in this 
> inode.
>> It looks this GFS2 behavior does not follow file system POSIX semantics, I 
> just want to know this problem belongs to a know issue or we can fix it?
>> By the way, I did the same testing on EXT4 and OCFS2 file systems, the 
> result looks OK.
>> I will paste my testing command lines and outputs as below,
>>
>> For EXT4 file system,
>> tb-nd1:/mnt/ext4 # rm -rf f3
>> tb-nd1:/mnt/ext4 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
>> 4+0 records in
>> 4+0 records out
>> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0393563 s, 107 MB/s
>> tb-nd1:/mnt/ext4 # vmtouch -v f3
>> f3
>> [                                                         ] 0/1024
>>
>>            Files: 1
>>      Directories: 0
>>   Resident Pages: 0/1024  0/4M  0%
>>          Elapsed: 0.000424 seconds
>> tb-nd1:/mnt/ext4 #
>>
>> For OCFS2 file system,
>> tb-nd1:/mnt/ocfs2 # rm -rf f3
>> tb-nd1:/mnt/ocfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
>> 4+0 records in
>> 4+0 records out
>> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0592058 s, 70.8 MB/s
>> tb-nd1:/mnt/ocfs2 # vmtouch -v f3
>> f3
>> [                                                         ] 0/1024
>>
>>            Files: 1
>>      Directories: 0
>>   Resident Pages: 0/1024  0/4M  0%
>>          Elapsed: 0.000226 seconds
>>
>> For GFS2 file system,
>> tb-nd1:/mnt/gfs2 # rm -rf f3
>> tb-nd1:/mnt/gfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
>> 4+0 records in
>> 4+0 records out
>> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0579509 s, 72.4 MB/s
>> tb-nd1:/mnt/gfs2 # vmtouch -v f3
>> f3
>> [             oo                         oOo              ] 48/1024
> 
> I cannot reproduce, at least not so easily. What kernel version is
> this? If it's not a mainline kernel, can you reproduce on mainline?
I always reproduce. I am using the kernel version 4.11.0-rc4-2-default, although the version is not latest,
it is enough new.
By the way, I add some printk in GFS2 and OCFS2 kernel module, I find GFS2 direct-IO always falls back to buffered IO, I am not sure this behavior is by-design. 
Of source, even GFS2 falls back to buffered IO, the code still make sure the related page cache invalidated, but the testing result is not by-expected, I need to look at the code deeply.
the printk outputs like,
[  198.176774] gfs2_file_write_iter: enter ino 132419 0 - 1048576
[  198.176785] gfs2_direct_IO: enter ino 132419 pages 0 0 - 1048576
[  198.176787] gfs2_direct_IO: exit ino 132419 - (0)   <<== here, gfs2_direct_IO always return 0, then fall back to buffered IO, his behavior is by-design?
[  198.184640] gfs2_file_write_iter: exit ino 132419 - (1048576) <<== The write_iter looks to return the right bytes.
[  198.189151] gfs2_file_write_iter: enter ino 132419 1048576 - 1048576
[  198.189163] gfs2_direct_IO: enter ino 132419 pages 8 1048576 - 1048576 <<== here, the inode's page number is greater than zero.
[  198.189165] gfs2_direct_IO: exit ino 132419 - (0)
[  198.195901] gfs2_file_write_iter: exit ino 132419 - (1048576)
But for OCFS2
[  120.331053] ocfs2_file_write_iter: enter ino 297475 0 - 1048576
[  120.331065] ocfs2_direct_IO: enter ino 297475 pages 0 0 - 1048576
[  120.343129] ocfs2_direct_IO: exit ino 297475 (1048576) <<== here, ocfs2_direct_IO can return the right bytes.
[  120.343132] ocfs2_file_write_iter: exit ino 297475 - (1048576)
[  120.347705] ocfs2_file_write_iter: enter ino 297475 1048576 - 1048576
[  120.347713] ocfs2_direct_IO: enter ino 297475 pages 0 1048576 - 1048576  <<== here, the inode's page number is always zero.
[  120.354096] ocfs2_direct_IO: exit ino 297475 (1048576)
[  120.354099] ocfs2_file_write_iter: exit ino 297475 - (1048576)

Thanks
Gang

> 
> Thanks,
> Andreas