Re: [PATCH] cifs: fix allocation size on newly created files

Steve French <smfrench@xxxxxxxxx> · Fri, 19 Mar 2021 13:48:26 -0500

Well ... it is a little complicated to query it on close in the
current cifs.ko compounding code and allocation size can change on the
server so returning a 'plausible' allocation size rather than an
impossible one is progress.

On Fri, Mar 19, 2021 at 1:26 PM Tom Talpey <tom@xxxxxxxxxx> wrote:
>
> Hrm. I am still uneasy about making up a number. It could break
> an application. And the issue isn't even in the client!
>
> Did you ping Neal, or contact dochelp about the Windows server
> behavior? I'd be happy to but I don't have any context, including
> which filesystem is doing this.
>
> On 3/19/2021 2:08 PM, Steve French wrote:
> > Yes - the Linux terminology is confusing.  Quoting from Ubuntu docs
> > e.g. about stat output
> >
> >             "IO Block" in stat's output is the preferred number of
> >             bytes that the file system uses for reading and writing files...
> >             "Blocks", on the other hand, counts how many 512-bytes blocks
> >             are allocated on disk for the file.
> >
> > So for NFS and SMB3 mounts they return 1MB for "IO Block" size.
> >
> > statfs on the other hand shows what the server thinks the block size
> > is (often 4K) but
> > that is a different field.
> >
> > And of course number of "blocks" in stat output is meant to return
> > allocation size
> > (in 512 byte units for historical reasons, even if most file systems
> > don't use 512
> > byte blocks anymore)
> >
> >
> > On Fri, Mar 19, 2021 at 12:52 PM Tom Talpey <tom@xxxxxxxxxx> wrote:
> >>
> >> But it's not the block size here, it's the cluster size. Block
> >> size is the per-io hunk, allocation size is the number of blocks
> >> lined up to receive it.
> >>
> >> Perhaps the safest number is the file size itself, unrounded.
> >>
> >> On 3/19/2021 1:46 PM, Steve French wrote:
> >>> e.g. stat reports much larger than 512 byte block size over SMB3
> >>>
> >>> # stat /mnt2/foo
> >>>     File: /mnt2/foo
> >>>     Size: 65536      Blocks: 128        IO Block: 1048576 regular file
> >>> Device: 34h/52d Inode: 88946092640651991  Links: 1
> >>>
> >>> and local file systems do the same ie "blocks" is unrelated to block size
> >>> the fs reports.  Here is an example to XFS locally
> >>>
> >>> # stat Makefile
> >>>     File: Makefile
> >>>     Size: 66247      Blocks: 136        IO Block: 4096   regular file
> >>> Device: 10302h/66306d Inode: 1076242180  Links: 1
> >>>
> >>> On Fri, Mar 19, 2021 at 12:42 PM Steve French <smfrench@xxxxxxxxx> wrote:
> >>>>
> >>>> We report the block size properly (typically much larger) - but the
> >>>> kernel API returns allocation size in 512 byte units no matter what the
> >>>> block size is.   Number of blocks returned for the kernel API
> >>>>        inode->i_blocks
> >>>> is unrelated to the block size (simply allocation_size/512 rounded up by 1).
> >>>>
> >>>> On Fri, Mar 19, 2021 at 12:38 PM Tom Talpey <tom@xxxxxxxxxx> wrote:
> >>>>>
> >>>>> On 3/19/2021 1:25 AM, Steve French wrote:
> >>>>>> Applications that create and extend and write to a file do not
> >>>>>> expect to see 0 allocation size.  When file is extended,
> >>>>>> set its allocation size to a plausible value until we have a
> >>>>>> chance to query the server for it.  When the file is cached
> >>>>>> this will prevent showing an impossible number of allocated
> >>>>>> blocks (like 0).  This fixes e.g. xfstests 614 which does
> >>>>>>
> >>>>>>        1) create a file and set its size to 64K
> >>>>>>        2) mmap write 64K to the file
> >>>>>>        3) stat -c %b for the file (to query the number of allocated blocks)
> >>>>>>
> >>>>>> It was failing because we returned 0 blocks.  Even though we would
> >>>>>> return the correct cached file size, we returned an impossible
> >>>>>> allocation size.
> >>>>>>
> >>>>>> Signed-off-by: Steve French <stfrench@xxxxxxxxxxxxx>
> >>>>>> CC: <stable@xxxxxxxxxxxxxxx>
> >>>>>> ---
> >>>>>>     fs/cifs/inode.c | 12 ++++++++++--
> >>>>>>     1 file changed, 10 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
> >>>>>> index 7c61bc9573c0..17a2c87b811c 100644
> >>>>>> --- a/fs/cifs/inode.c
> >>>>>> +++ b/fs/cifs/inode.c
> >>>>>> @@ -2395,7 +2395,7 @@ int cifs_getattr(struct user_namespace
> >>>>>> *mnt_userns, const struct path *path,
> >>>>>>      * We need to be sure that all dirty pages are written and the server
> >>>>>>      * has actual ctime, mtime and file length.
> >>>>>>      */
> >>>>>> - if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_SIZE)) &&
> >>>>>> + if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_SIZE |
> >>>>>> STATX_BLOCKS)) &&
> >>>>>>          !CIFS_CACHE_READ(CIFS_I(inode)) &&
> >>>>>>          inode->i_mapping && inode->i_mapping->nrpages != 0) {
> >>>>>>      rc = filemap_fdatawait(inode->i_mapping);
> >>>>>> @@ -2585,6 +2585,14 @@ cifs_set_file_size(struct inode *inode, struct
> >>>>>> iattr *attrs,
> >>>>>>      if (rc == 0) {
> >>>>>>      cifsInode->server_eof = attrs->ia_size;
> >>>>>>      cifs_setsize(inode, attrs->ia_size);
> >>>>>> + /*
> >>>>>> + * i_blocks is not related to (i_size / i_blksize),
> >>>>>> + * but instead 512 byte (2**9) size is required for
> >>>>>> + * calculating num blocks. Until we can query the
> >>>>>> + * server for actual allocation size, this is best estimate
> >>>>>> + * we have for the blocks allocated for this file
> >>>>>> + */
> >>>>>> + inode->i_blocks = (512 - 1 + attrs->ia_size) >> 9;
> >>>>>
> >>>>> I don't think 512 is a very robust choice, no server uses anything
> >>>>> so small any more. MS-FSA requires the allocation quantum to be the
> >>>>> volume cluster size. Is that value available locally?
> >>>>>
> >>>>> Tom.
> >>>>>
> >>>>>>      /*
> >>>>>>      * The man page of truncate says if the size changed,
> >>>>>> @@ -2912,7 +2920,7 @@ cifs_setattr_nounix(struct dentry *direntry,
> >>>>>> struct iattr *attrs)
> >>>>>>      sys_utimes in which case we ought to fail the call back to
> >>>>>>      the user when the server rejects the call */
> >>>>>>      if ((rc) && (attrs->ia_valid &
> >>>>>> - (ATTR_MODE | ATTR_GID | ATTR_UID | ATTR_SIZE)))
> >>>>>> +     (ATTR_MODE | ATTR_GID | ATTR_UID | ATTR_SIZE)))
> >>>>>>      rc = 0;
> >>>>>>      }
> >>>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Thanks,
> >>>>
> >>>> Steve
> >>>
> >>>
> >>>
> >
> >
> >
> > --
> > Thanks,
> >
> > Steve
> >

-- 
Thanks,

Steve