Hou Tao wrote on Thu, Jan 24, 2019: > > hmm, part of me wants to abuse v9fs_i_size_write to also update i_blocks > > as I think we'd usually want to keep both in sync.We can do that for > > v9fs_stat2inode(), but for v9fs_stat2inode_dotl() > > the updates of i_block & i_size are controlled by different flags, so > separated updates are needed (maybe also add an extra flags ?). Is there any case where i_block and i_size are not updated together? I would be happy with an extra flag if required, I just do not see where right now (admitedly didn't look for long, will have a second look tomorrow) > > In v9fs_file_write_iter we do not update i_blocks directly but > > inode_add_bytes basically does the same thing... Speaking of which it's > > assuming inode->i_bytes is set correctly initially; I _think_ we're > > supposed to call inode_set_bytes() instead of writing i_blocks directly > > in stat2inode? > > Sorry, I'm not too familiar with this code and there doesn't seem to be > > many users of inode_set_bytes so it's a bit puzzling to me. > > In my understanding, now only disk quota uses i_bytes, so for 9p there is not so > much different between updating i_blocks directly and using inode_set_bytes(), > but maybe using inode_set_bytes() will be appropriate. I also think only quota uses i_bytes directly, but 9p also uses it indirectly through inode_add_bytes() - i_blocks is incremented by 1 every 512 bytes and i_bytes is reduced to its last few bits (basically %512) > > I'm not using cache mode much but I have some very weird behaviour > > currently if I mount something with cache=fscache : > > - if I create a new file from 9p and stat on the same client, a size of > > 0-511 has block = 0 then it's just always off by 1 > Is this not the expected result ? v9fs_write_end() will call > inode_add_bytes(inode,511) if we write 511 bytes to the new file and > i_blocks will be 0. That is the obvious result from the current code, yes - but it definitely is wrong. i_blocks should be 1 from byte 1 onwards until 512, and bumped to 2 at 513. From a filesystem point of view, a block is already "consumed" from the first byte onward. (This isn't what inode_set_bytes does..) In practice, this actually is fairly important - tools like tar are optimized such that if a file has no blocks it is considered sparse and isn't read at all, so this could lead to data loss. Interestingly I cannot reproduce with a recent gnu tar but I am sure I have seen the behaviour (some lustre bug[1] talks about this as well and the code is still present[2]); we should not report a st_blocks of 0 is there is data -- v9fs_stat2inode (not v9fs_stat2inode_dotl) also rounds up i_blocks so this is the right behaviour, we're just not doing it when creating new files apparently. [1] https://jira.whamcloud.com/browse/LU-3864 [2] http://git.savannah.gnu.org/cgit/tar.git/tree/src/sparse.c#n273 If this optimization didn't exist and given i_bytes existence it does seem more correct to do like inode_set_bytes (e.g. blocks = bytes / 512 and bytes = bytes % 512) but this really strikes me as odd; this might be why hardly anyone uses this... > > - if I create a file outside of 9p and stat it, for example starting > > with 400 bytes, it starts with 8 blocks then the number of blocks > > increase by 1 everytime 512 bytes are written regardless of initial > > size. > > The initial i_blocks comes from the underlying filesystem, so I think 8 blocks > just mean the block size of the underlying filesystem is 4KB. If another 512 bytes > are written into the file, inode_add_bytes() will increase i_blocks by one instead of > fetching i_blocks from server, though i_blocks in server side will > still be 8. Ah, I had only looked at v9fs_stat2inode() which creates the value from i_size, not v9fs_stat2inode_dotl(). v9fs_stat2inode_dotl() does take the value from the RPC, and then 8 is indeed correct -- it will just be very painful to keep synchronized with the underlying fs; we have no ways of knowing when the remote filesystem decides to allocate new blocks. If for example the write iter only writes 0 and the underlying fs is optimized st_blocks might not increase at all. > Maybe we should call v9fs_invalidate_inode_attr() in v9fs_write_end() ? Hmm, That will probably add quite a lot of RPCs and make the cache option rather useless... Honestly as far as I'm concerned the value of i_blocks is mostly informative (it's used by tools like du to give a size estimate, and that's about it?) - with the exception of the 0 value I was talking earlier. I'd advocate to never display the remote st_blocks but do like v9fs_stat2inode() and always just set blocks appropriately from the size (i.e. i_blocks = (i_size + 512 - 1) >> 9 and i_bytes = 0), but I'd like some opinions first. I'll send a mail to fsdevel asking vfs folks for advices tomorrow; let's put this patch on hold for a few days. Thanks, -- Dominique