On Fri, Jan 26, 2024 at 02:54:24AM +0000, Matthew Wilcox wrote: > On Fri, Jan 26, 2024 at 12:22:32PM +1100, Dave Chinner wrote: > > On Thu, Jan 25, 2024 at 07:19:45PM +0900, Namjae Jeon wrote: > > > We need to consider the case that mmap against files with different > > > valid size and size created from Windows. So it needed to zero out in mmap. > > > > That's a different case - that's a "read from a hole" case, not a > > "extending truncate" case. i.e. the range from 'valid size' to EOF > > is a range where no data has been written and so contains zeros. > > It is equivalent to either a hole in the file (no backing store) or > > an unwritten range (backing store instantiated but marked as > > containing no valid data). > > > > When we consider this range as "reading from a hole/unwritten > > range", it should become obvious the correct way to handle this case > > is the same as every other filesystem that supports holes and/or > > unwritten extents: the page cache page gets zeroed in the > > readahead/readpage paths when it maps to a hole/unwritten range in > > the file. > > > > There's no special locking needed if it is done this way, and > > there's no need for special hooks anywhere to zero data beyond valid > > size because it is already guaranteed to be zeroed in memory if the > > range is cached in the page cache..... > > but the problem is that Microsoft half-arsed their support for holes. > See my other mail in this thread. Why does that matter? It's exactly the same problem with any other filesytsem that doesn't support sparse files. All I said is that IO operations beyond the "valid size" should be treated like a operating in a hole - I pass no judgement on the filesystem design, implementation or level of sparse file support it has. ALl it needs to do is treat the "not valid" size range as if it was a hole or unwritten, regardless of whether the file is sparse or not.... > truncate the file up to 4TB > write a byte at offset 3TB > > ... now we have to stream 3TB of zeroes through the page cache so that > we can write the byte at 3TB. This behaviour cannot be avoided on filesystems without sparse file support - the hit of writing zeroes has to be taken somewhere. We can handle this in truncate(), the write() path or in ->page_mkwrite *if* the zeroing condition is hit. There's no need to do it at mmap() time if that range of the file is not actually written to by the application... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx