Filesystem setattr/truncate notes and problems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 03, 2010 at 05:55:38AM +1000, Nick Piggin wrote:
> Well I looked through filesystems and I cannot seem to find anywhere
> they can use ATTR_SIZE|ATTR_[MC]TIME to do anything strange that would
> let them distinguish truncate from ftruncate and O_TRUNC.
> 
> I couldn't quite understand what FUSE was trying to do, or whether I
> improved it.

Looking through the filesystems, I thought I found a number of issues.
I'll try to list what I understand to be important steps required in
setattr. Unless I'm way off, a lot of filesystems get these wrong, so
we may want to add some guideline to docs?

Anyway feedback on these would be appreciated. Below this there are
some questions about specific filesystems too.

Filesystems setattr methods should:

- Perform idempotent error checks as early as possible. Perform actual
  i_size change and pagecache truncation last[*]. Truncate means dirty
  pagecache is fine to be thrown out at any time[**], what is important
  is that it is not thrown out before the operation can complete (eg. by
  doing the on-disk truncation).

  [*] Pagecache truncation is irreversable unless the filesystem can
  guarantee that the page is not mapped and has no dangling
  get_user_pages references (it's almost impossible to check this
  properly unless get_user_pages is disallowed from the start) or the
  file has been readonly. Just writing back the pagecache does not
  guarantee no data loss window. So it's probably easiest to do
  pagecache truncation last.

  [**] Be careful, this might mean the page is subject to writeout
  inside i_size after the on-disk truncation. Buffer based filesystems
  will probably need rework to handle this properly.

- Update mtime and ctime IFF ATTR_MTIME and ATTR_CTIME are set in
  setattr.

- Update mode before owner or size if mode update can fail. Otherwise
  killing of suid could fail after owner/size has changed (ideally
  these would be all done atomically). Although we do already have weird
  races in permission checking in core vfs code anyway.

- Perform time modifications iff others have/will succeeded (these are
  applicable for chmod, chown, etc too). Do not succeed a truncate and
  then error out because the time cannot be set. Do not set the time but
  fail the truncate or chown etc.

- If ATTR_SIZE change cannot succeed, don't ignore it, return something
  (like -EPERM or -EINVAL), right? (lots of filesystems just mask it
  off, maybe that is preferable behaviour sometimes?)

- Remember to actually trim off disk blocks or underlying object past
  i_size in response to setattr shrinking the file.

- In case of write_begin/write_end/direct_IO failure, trim object past
  i_size if it was possible to have been allocated before the operation
  fails.


Interesting specific filesystem questions: (not comprehensive)

ecryptfs improve mtime/ctime handling for truncate of lower inode

fuse, should be setting ctime on truncate?

gfs2, mtime and ctime should only be set in truncate if the ATTR_?TIME
are set.

hostfs, don't ignore ATTR_SIZE. ftruncate/truncate have interesting
different semantics for timestamp updates, so don't use fd != -1 for
these (could use utimes for this, but it causes an atomicity/error
handling issue).

hpfs could use cont_expand to do expanding truncates?

ncpfs, nfs smbfs, cifs seems to perform inode size checks
(inode_newsize_ok) after truncating the file on the server??

ntfs does not do inode size checks properly.

logfs logfs_truncate without doing inode_change_ok, inode_newsize_ok etc
can fail inode_setattr after successful truncate (truncate(2) would
return failure and yet the file would be truncated).

procfs why not return -EPERM rather than setting size?

reiserfs should do all setattr checks early as possible
(inode_change_ok, inode_newsize_ok).

ubifs should not update access times unconditionally when ATTR_SIZE is
set

ufs simple_setsize still destroys the pagecache and causes concurrent
reads to go EOF past updated i_size. Restoring old i_size is too late
I think. Should do those checks first (in fairness lots of filesystems
have bit problems with error handling, but ufs attempts a little bit and
gets it wrong).

xfs in the 0 length file shortcut, isn't this missing suid kill case?
(ftruncate mandatory mtime update seems like it wasn't working right
there either before this patch).

nfsd should not change attributes in the case of truncated file with no
size change?

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux