On Thu 04-04-19 20:46:47, zhangyi (F) wrote: > On 2019/4/4 18:18, Jan Kara Wrote: > > On Thu 04-04-19 17:29:52, zhangyi (F) wrote: > >> Currently we capture a NULL data exposure problem after a crash or > >> poweroff when append writing a file in the data=ordered mode. The > >> problem is that we were not add inode to the transaction's order data > >> list when updating i_disksize without new block allocation no matter > >> the delay allocated block feature is enabled or not. > >> > >> write jbd2 writeback > >> append write in allocated block > >> mark buffer dirty > >> update i_disksize > >> mark inode dirty > >> commit transaction > >> write inode > >> (data exposure after a crash) > >> write dirty buffer > >> > >> It's fine in the case of new block allocation because we do this job in > >> ext4_map_blocks(). To fix this problem, this patch add inode to current > >> transaction's order data list after new data is copied and needing > >> update i_disksize in the case of no block allocation. > >> > >> Fixes: 06bd3c36a733ac ("ext4: fix data exposure after a crash") > >> Fixes: f3b59291a69d0b ("ext4: remove calls to ext4_jbd2_file_inode() from delalloc write path") > >> Signed-off-by: zhangyi (F) <yi.zhang@xxxxxxxxxx> > > > > Thanks for the patch. The current behavior is a deliberate decision. > > data=ordered mode does guarantee there is no stale data visible in case of > > crash. However it does not guarantee you cannot see zeros where data was > > written. > > > > Hi Jan, > Thanks a lot for your explanation. I read the > Documentation/admin-guide/ext4.rst, which said about the ordered mode: > > > ... When it's time to write the new metadata out to disk, the associated data > > blocks are written first... > > So I reckon that the dirty data block should be written to disk before > committing i_disksize and we cannot see the zero data. Now, I don't find > any offical docs to record the behavior you mentioned, do you have some > links talk about this behavior or am I miss something ? Yeah, I agree that paragraph in Documentation/admin-guide/ext4.rst could be interpretted the way you understood it. Ext4 Wiki [1] has this comment: (Note, however, that Ordered Mode does not guarantee that the file will be consistent at an application level; the application must use fsync() at appropriate commit points in order to guarantee application-level consistency.) In addition, there are some applications which depend on data=ordered to automatically force data blocks to be written to disk soon after the file is written. Using Writeback Mode extends the time from when a file is written to when it is pushed out to disk to 30 seconds. This can be surprising for some users; however, it should be noted that such problems can still be an issue with Ordered Mode (although they are much rarer). Again, a careful application or library should always use fsync() at points where the application is at a stable commit point. --- And we generally always made it clear that data=ordered mode is a "security feature" - don't expose potentialy sensitive data after crash. Not app data consistency feature (fsync(2) is for that). I don't have a good reference for that except perhaps bugzilla comment from Ted [2] where he explains some rationale behind data=ordered mode. So overall no, data=ordered mode is not designed to provide the data integrity guarantees you'd like to see. And providing more guarantees has performance (and maintenance) costs so it should be better justified than just "it would be nice for applications". Also when you'd like to provide some guarantees about data integrity, you need to specify your "promise" (currently our promise is: "no uninitialized data visible"). Because full "writes are atomic operation wrt crash" kind of guarantee is very hard (expensive) to achieve. I hope this clears things out a bit. Honza [1] https://ext4.wiki.kernel.org/index.php/Ext3_Data=Ordered_vs_Data=Writeback_mode [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45 -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR