Hi Dave, On Fri, Jul 24, 2020 at 08:07:52AM +1000, Dave Chinner wrote: > > > > @@ -183,11 +184,16 @@ static void > > > > iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, > > > > unsigned len) > > > > { > > > > + struct inode *inode = file_inode(dio->iocb->ki_filp); > > > > struct page *page = ZERO_PAGE(0); > > > > int flags = REQ_SYNC | REQ_IDLE; > > > > struct bio *bio; > > > > > > > > bio = bio_alloc(GFP_KERNEL, 1); > > > > + > > > > + /* encrypted direct I/O is guaranteed to be fs-block aligned */ > > > > + WARN_ON_ONCE(fscrypt_needs_contents_encryption(inode)); > > > > > > Which means you are now placing a new constraint on this code in > > > that we cannot ever, in future, zero entire blocks here. > > > > > > This code can issue arbitrary sized zeroing bios - multiple entire fs blocks > > > blocks if necessary - so I think constraining it to only support > > > partial block zeroing by adding a warning like this is no correct. > > > > In v3 and earlier this instead had the code to set an encryption context: > > > > fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, > > GFP_KERNEL); > > > > Would you prefer that, even though the call to fscrypt_set_bio_crypt_ctx() would > > Actually, I have no idea what that function does. It's not in a > 5.8-rc6 kernel, and it's not in this patchset.... The cover letter mentions that this patchset is based on fscrypt/master. That is, "master" of https://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git fscrypt_set_bio_crypt_ctx() was introduced by "fscrypt: add inline encryption support" on that branch. > > > always be a no-op currently (since for now, iomap_dio_zero() will never be > > called with an encrypted file) and thus wouldn't be properly tested? > > Same can be said for this WARN_ON_ONCE() code :) > > But, in the interests of not leaving landmines, if a fscrypt context > is needed to be attached to the bio for data IO in direct IO, it > should be attached to all bios that are allocated in the dio path > rather than leave a landmine for people in future to trip over. My concern is that if we were to pass the wrong 'lblk' to fscrypt_set_bio_crypt_ctx(), we wouldn't catch it because it's not tested. Passing the wrong 'lblk' would cause the data to be encrypted/decrypted incorrectly. It's not a big deal though, as it's "obviously correct". So we can just go with that if you prefer it. > > > BTW, iomap_dio_zero() is actually limited to one page, so it's not quite > > "arbitrary sizes". > > Yup, but that's an implentation detail, not a design constraint. > i.e. I typically review/talk about how stuff functions at a > design/architecture level, not how it's been implemented in the > code. > > e.g. block size > page size patches in progress make use of the > "arbitrary length" capability of the design: > > https://lore.kernel.org/linux-xfs/20181107063127.3902-7-david@xxxxxxxxxxxxx/ > > > iomap is used for other filesystem operations too, so we need to consider when > > to actually do the limiting. I don't think we should break up the extents > > returned FS_IOC_FIEMAP, for example. FIEMAP already has a defined behavior. > > Also, it would be weird for the list of extents that FIEMAP returns to change > > depending on whether the filesystem is mounted with '-o inlinecrypt' or not. > > We don't need to care about that in the iomap code. The caller > controls the behaviour of the mapping callbacks themselves via > the iomap_ops structure they pass into high level iomap functions. Sure, I wasn't saying we need to. I was talking about what we need to do in ext4. > > > That also avoids any confusion between pages and blocks, which is nice. > > FWIW, the latest version of the above patchset (which, > co-incidentally, I was bring up to date yesterday) abstracts away > page and block sizes. It introduces the concept of "chunk size" > which is calculated from the combination of the current page's size > and the current inode's block size. > > i.e. in the near future we are going to have both variable page > sizes (on a per-page basis via Willy's current work) and per-inode > blocks sizes smaller, the same and larger than the size of the > current pager. Hence we need to get rid of any assumptions about > page sizes and block sizes in the iomap code, not introduce new > ones. > > Hence if there is any limitation of filesystem functionality based > on block size vs page size, it is going to be up to the filesystem > to detect and enforce those restrictions, not the iomap > infrastructure. Sure, again I was talking about what we'll be doing in ext4, since with the proposed change, it will be ext4 that does fscrypt_limit_io_blocks(). The limit is based on blocks, not pages, so "fscrypt_limit_io_pages()" was a bit weird. Note that currently, I don't think iomap_dio_bio_actor() would handle an encrypted file with blocksize > PAGE_SIZE correctly, as the I/O could be split in the middle of a filesystem block (even after the filesystem ensures that direct I/O on encrypted files is fully filesystem-block-aligned, which we do --- see the rest of this patchset), which isn't allowed on encrypted files. However we currently don't support blocksize > PAGE_SIZE in ext4, f2fs, or fs/crypto/ at all, so I don't think we should add extra logic to fs/iomap/ to try to handle that case for encrypted files when we'd have no way to test it. - Eric