Re: [PATCH] ext4: don't BUG when truncating encrypted inodes on the orphan list

Andreas Dilger <adilger@xxxxxxxxx> · Thu, 9 Mar 2017 12:21:11 -0700

On Mar 9, 2017, at 6:47 AM, Jan Kara <jack@xxxxxxx> wrote:
> 
> On Sat 11-02-17 21:27:38, Ted Tso wrote:
>> On Sat, Feb 11, 2017 at 12:26:52AM -0700, Andreas Dilger wrote:
>>> The reason truncated orphans are on the orphan list is because the
>>> transaction that sets i_size may be restarted if the inode is larger
>>> than can be truncated in a single transaction.  If the system crashes
>>> before the truncate finishes then the truncate should be completed
>>> so that old data is not returned if the file is truncated larger again.
>> 
>> Another way of fixing this is at the time when the file is truncated
>> to a larger size.  Of course the other case we need handle is what
>> happens if there is data after i_size and the file is mmaped.
>> 
>> One advantage of doing when the file is truncated larger again is at
>> that point we will have the encryption key.  In the case of an
>> encrypted file, both the kernel and e2fsck *can't* zero fill past
>> i_size if the key is not available.  And during the orphan replay the
>> encryption key won't be available.
>> 
>> The other way to solve the problem would be zero the portion of the
>> last remaining datablock *first* and journal the data block along with
>> the initial transaction which sets the i_size in the inode.  But that
>> gets tricky, since all data writes for that last block must not go to
>> the disk, and then once the journal has been committed we can't write
>> the block to via the normal page_io routines (since otherwise it might
>> get overwritten), until we write it back and then revoke the block in
>> the journal, and the revoke is committed.  Messy....
> 
> Going through some old email... I don't think this would be really
> reasonably doable. What would fixup the missing zeroing on orphan cleanup
> though is to zero the tail of the last page on readpage, extending
> truncate and write beyond EOF. That may be acceptable cost for encrypted
> inodes.

Another option would be to revive the unlink/truncate thread, and dump
the blocks to be truncated over to another (temporary) inode that is put
on the orphan list and will be unlinked.  That means the visible truncate
operation can always complete in a single transaction (including the
partial block write), and everything on the orphan list is essentially an
unlink rather than a truncate.

The code wasn't too complex, but we dropped it when extents arrived since
it didn't give a huge performance advantage.  That said, there could be a
benefit in terms of code simplification, since there wouldn't be the need
to restart transactions in the middle if the truncate gets too large.

The most recent version I could find is for ext3 in 2.4.29 at:

https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=blob_plain;f=lustre/kernel_patches/patches/ext3-delete_thread-2.4.29.patch;hb=113303973ec9f8484eb2355a1a6ef3c4c7fd6a56

Cheers, Andreas

Attachment:
signature.asc

Description: Message signed with OpenPGP