Hi Eric, Thanks for your detailed reply. My english is not quite well, I could not type logically and quickly like you and could use some words improperly, I just want to express my personal concern, please understand, thanks. :) On 2018/8/25 12:16, Eric Biggers wrote: > Hi Gao, > > On Sat, Aug 25, 2018 at 10:29:26AM +0800, Gao Xiang wrote: >> Hi, >> >> On 2018/8/25 0:16, Eric Biggers wrote: >>> +/** >>> + * fsverity_verify_page - verify a data page >>> + * >>> + * Verify a page that has just been read from a file against that file's Merkle >>> + * tree. The page is assumed to be a pagecache page. >>> + * >>> + * Return: true if the page is valid, else false. >>> + */ >>> +bool fsverity_verify_page(struct page *data_page) >>> +{ >>> + struct inode *inode = data_page->mapping->host; >>> + const struct fsverity_info *vi = get_fsverity_info(inode); >>> + struct ahash_request *req; >>> + bool valid; >>> + >>> + req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL); >>> + if (unlikely(!req)) >>> + return false; >>> + >>> + valid = verify_page(inode, vi, req, data_page); >>> + >>> + ahash_request_free(req); >>> + >>> + return valid; >>> +} >>> +EXPORT_SYMBOL_GPL(fsverity_verify_page); >>> + >>> +/** >>> + * fsverity_verify_bio - verify a 'read' bio that has just completed >>> + * >>> + * Verify a set of pages that have just been read from a file against that >>> + * file's Merkle tree. The pages are assumed to be pagecache pages. Pages that >>> + * fail verification are set to the Error state. Verification is skipped for >>> + * pages already in the Error state, e.g. due to fscrypt decryption failure. >>> + */ >>> +void fsverity_verify_bio(struct bio *bio) >>> +{ >>> + struct inode *inode = bio_first_page_all(bio)->mapping->host; >>> + const struct fsverity_info *vi = get_fsverity_info(inode); >>> + struct ahash_request *req; >>> + struct bio_vec *bv; >>> + int i; >>> + >>> + req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL); >>> + if (unlikely(!req)) { >>> + bio_for_each_segment_all(bv, bio, i) >>> + SetPageError(bv->bv_page); >>> + return; >>> + } >>> + >>> + bio_for_each_segment_all(bv, bio, i) { >>> + struct page *page = bv->bv_page; >>> + >>> + if (!PageError(page) && !verify_page(inode, vi, req, page)) >>> + SetPageError(page); >>> + } >>> + >>> + ahash_request_free(req); >>> +} >>> +EXPORT_SYMBOL_GPL(fsverity_verify_bio); >> >> Out of curiosity, I quickly scanned the fs-verity source code and some minor question out there.... >> >> If something is wrong, please point out, thanks in advance... >> >> My first question is that 'Is there any way to skip to verify pages in a bio?' >> I am thinking about >> If metadata and data page are mixed in a filesystem of such kind, they could submit together in a bio, but metadata could be unsuitable for such kind of verification. >> > > Pages below i_size are verified, pages above are not. > > With my patches, ext4 and f2fs won't actually submit pages in both areas in the > same bio, and they won't call the fs-verity verification function for bios in > the data area. But even if they did, there's also a check in verify_page() that I think you mean the hash area? Yes, I understand your design. It is a wonderful job for ext4/f2fs for now as Ted said. > skips the verification if the page is above i_size. > I think it could not be as simple as you said for all cases. If some fs submits contiguous access with different MAPPING (something like mixed FILE_MAPPING and META_MAPPING), their page->index are actually unreliable(could be logical page index for FILE_MAPPING,and physical page index for META_MAPPING), and data are organized by design in multi bios for a fs-specific use (such as compresssion). You couldn't do such verification `if the page is above i_size' and it could be hard to integrate somehow. >> The second question is related to the first question --- 'Is there any way to verify a partial page?' >> Take scalability into consideration, some files could be totally inlined or partially inlined in metadata. >> Is there any way to deal with them in per-file approach? at least --- support for the interface? > > Well, one problem is that inline data has its own separate I/O path; see > ext4_readpage_inline() and f2fs_read_inline_data(). So it would be a large > effort to support features like encryption and verity which require > postprocessing after reads, and probably not worthwhile especially for verity > which is primarily intended for large files. Yes, for the current user ext4 and f2fs, it is absolutely wonderful. I have to admit I am curious about Google fs-verity roadmap for the future Android (I have to identify whether it is designed to replace dm-verity, currently I think is no) since it is very important whether our EROFS should support fs-verity or not in the near future... I could give some EROFS use case if you have some time to discuss. EROFS uses a more aggressive inline approach, which means it not only inline data for small files. It is designed to inline the last page, which are reasonable small (eg. only a few byte) to inline for all files, eg. IN FILE_MAPPING IN META_MAPPING blk-aligned +--------------------------------------| +--------+--------+ +----------+.......+ |inode A+inlined-last data.. inode B...| | page 0 | page 1 | ... | page n-1 . page n. +--------------------------------------+ +--------+--------+ +----------+.......+ |------------------------------------------------------------------------------------------|\ In priciple, this approach could be also used for read-write file systems to save more storage space. I think it is still easy for uncompressed file if you do the zero padding as you said below. But if considering _compression_.....especially compression in VLE, I think it should not rely on `bio' directly, because, 1) endio with compressed data rather than FILE_MAPPING plain data, these pages which could from META_MAPPING (for caching compressed page on purpose) or FILE_MAPPING(for decompressing in-place to save redundant META_MAPPING memory). I think it should be decompress at first and then fs-verity, but there could be more filepages other than compresssed pages joined (eg. 128kb->32kb, we submit 8 pages but decompress end with 32 pages), it should not be the original bio any more... (actually I think it is not the bio concept anymore...) 2) EROFS VLE is more complicated, we could end a bio with a compressed page but decompress a partial file page, eg. +-------------------+--------------------+ ... | compressed page X |compressed page X+1 | +-------------------|--------------------+ end of bio Y/ bio Y+1 \ | / +-------------------------+ | plain data (file page)| +-------------------------+ which means a bio could only decompress partial data of a page, the page could be Uptodate by two bios rather than one, I have no idea how to fs-verity like this... `it could call fsverity after assembling the page in the page cache.` as Ted said in that case. > > A somewhat separate question is whether the zero padding to a block boundary > after i_size, before the Merkle tree begins, is needed. The answer is yes, > since mixing data and metadata in the same page would cause problems. First, > userspace would be able to mmap the page and see some of the metadata rather > than zeroes. That's not a huge problem, but it breaks the standard behavior. > Second, any page containing data cannot be set Uptodate until it's been > verified. So, a special case would be needed to handle reading the part of the > metadata that's located in a data page. Yes, after I just thinked over, I think there should be a zero padding to a block boundary as you said due to Uptodate and mmap scenerio if you directly use its inode(file) mapping for verification. > >> At last, I hope filesystems could select the on-disk position of hash tree and 'struct fsverity_descriptor' >> rather than fixed in the end of verity files...I think if fs-verity preparing such support and interfaces could be better.....hmmm... :( > > In theory it would be a much cleaner design to store verity metadata separately > from the data. But the Merkle tree can be very large. For example, a 1 GB file > using SHA-512 would have a 16.6 MB Merkle tree. So the Merkle tree can't be an > extended attribute, since the xattrs API requires xattrs to be small (<= 64 KB), > and most filesystems further limit xattr sizes in their on-disk format to as > little as 4 KB. Furthermore, even if both of these limits were to be increased, > the xattrs functions (both the syscalls, and the internal functions that > filesystems have) are all based around getting/setting the entire xattr value. > > Also when used with fscrypt, we want the Merkle tree and fsverity_descriptor to > be encrypted, so they doesn't leak plaintext hashes. And we want the Merkle > tree to be paged into memory, just like the file contents, to take advantage of > the usual Linux memory management. > > What we really need is *streams*, like NTFS has. But the filesystems we're > targetting don't support streams, nor does the Linux syscall interface have any > API for accessing streams, nor does the VFS support them. > > Adding streams support to all those things would be a huge multi-year effort, > controversial, and almost certainly not worth it just for fs-verity. > > So simply storing the verity metadata past i_size seems like the best solution > for now. > > That being said, in the future we could pretty easily swap out the calls to > read_mapping_page() with something else if a particular filesystem wanted to > store the metadata somewhere else. We actually even originally had a function > ->read_metadata_page() in the filesystem's fsverity_operations, but it turned > out to be unnecessary and I replaced it with directly calling > read_mapping_page(), but it could be changed back at any time. OK, I got it. I have to look into that and think over again. Thanks for your reply again in the end. :) Thanks, Gao Xiang > > - Eric >