Re: Illegal value 0 for -l sunit option ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Nov 18, 2017 at 10:23:26PM -0600, Eric Sandeen wrote:
> On 11/18/17 3:51 PM, Dave Chinner wrote:
> > On Sat, Nov 18, 2017 at 04:00:08PM +0100, Vladimir Gozora wrote:
> >> Hello XFS team,
> >>
> >> I'm working on ReaR project (https://github.com/rear/rear) which
> >> aims for bare-metal disaster recovery.
> >> Lately I’ve run across behavior of mkfs.xfs which I don’t really
> >> know if is correct or not.
> >> The thing is that when I try to create XFS file system with
> >> xfsprogs-4.5.0 following commands runs just fine:
> >> mkfs.xfs -f -i size=512 -d agcount=4 -s size=512 -i attr=2 -i
> >> projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d
> >> sunit=0 -d swidth=0 -l version=2 -l sunit=0 -l lazy-count=1 -n
> >> size=4096 -n version=2 -r extsize=4096 <destination>
> >> when I try same command with xfsprogs-4.10.0, I get following error:
> >> Illegal value 0 for -l sunit option. value is too small
> > 
> > Yup, old mkfs would accept values that were illegal and could do
> > completly unpredictable things with them. We tightened up the input
> > parsing to only accept valid values xfsprogs in 4.7.0, and in
> > particular this commit is relevant:
> > 
> > commit 2942ff49bdb6df08fb56674215b31c07cfc7c1fd
> > Author: Jan Tulak <jtulak@xxxxxxxxxx>
> > Date:   Tue Jun 21 12:52:22 2016 +1000
> > 
> >     mkfs: fix -l su minval
> >     
> >     -l su should be in range BBTOB(1) <= L_SU <= XLOG_MAX_RECORD_BSIZE,
> >     because the upper limit is imposed by kernel on iclogbuf: stripe
> >     unit can't be bigger than the log buffer, but the log buffer can
> >     span multiple stripe units. L_SUNIT is changed in the same way.
> >     
> >     Signed-off-by: Jan Tulak <jtulak@xxxxxxxxxx>
> >     Reviewed-by: Eric Sandeen <sandeen@xxxxxxxxxx>
> >     Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>
> 
> Honestly, the minimum changed before that - the commit below fixed an
> incorrect minval of XLOG_MIN_RECORD_BSIZE (16k) to "1" -
> which honestly doesn't make much sense to me.

Yeah, that was just plain wrong - I forgot that log stripe unit
could be any integer multiple of the fliesystem block size, not just
the valid sizes of the in-core log buffers. The intent was to remove
the specification of illegal values - many other illegal
values were also caught by that set of commits in 4.7.

> It was 1974d3f1a6495978419d931020f63c9cac8ba230 which changed
> it from accepting min 0 to accepting min XLOG_MIN_RECORD_BSIZE:
> 
>                 { .index = L_SUNIT,
> +                 .minval = BTOBB(XLOG_MIN_RECORD_BSIZE),
> +                 .maxval = BTOBB(XLOG_MAX_RECORD_BSIZE),
>                   .defaultval = SUBOPT_NEEDS_VAL,
>                 },
> 
> ...
> 
>                                 case L_SUNIT:
> -                                       if (!value || *value == '\0')
> -                                               reqval('l', subopts, L_SUNIT);
>                                         if (lsunit)
>                                                 respec('l', subopts, L_SUNIT);
> -                                       lsunit = getnum(value, 0, 0, false);
> -                                       if (lsunit < 0)
> -                                               illegal(value, "l sunit");
> +                                       lsunit = getnum_checked(value, &lopts,
> +                                                                L_SUNIT);
>                                         lsunitflag = 1;
>                                         break;
> 
> Specifying "0" as a value to mean "no stripe" is how I always
> interpreted these things.

Which you can't actually do for v2 logs.  V2 logs *always* has a
stripe unit set, it's supposed to be a multiple of the filesystem
block size, and it's supposed to be recorded in bytes in the
superblock sb_logsunit.  The thing is, this means that a 4k block
size filesystem would always have a stripe unit of 4k - which is
clearly doesn't - and that would have been a performance regression
vs v1 logs for fsync heavy workloads. I'll come back to this.

ALso, let's not forget that setting lsu = 0 is completely illegal on
devices with a sector size != BBSIZE - the log stripe unit is
supposed to indicate the sector size for such devices, and mkfs will
calculate that automatically based on topology info.

> Not to mention that D_SUNIT accepts 0 without complaint, and  it's
> further compounded/confusing by xfs_info reporting a stripe unit
> of 0 blocks, something it won't accept on a mkfs commandline?
> Makes little sense.

It's perfectly valid to have a zero data stripe unit - it has no
"atomic write" configuration infomration encoded in it. That's why
it's not valid to have a zero log stripe unit for v2 logs - it's
critical information for correct log behaviour.  Let's go
back to the mkfs commit that introduced V2 logs:

commit 73bf5988043fcb9d8981a1d8fb4f9936ba0a0551
Author: Steve Lord <lord@xxxxxxx>
Date:   Tue Jun 18 20:32:20 2002 +0000

    Bump revision number of version 2 log support

[.....]

+       if (logversion == 2)
+               sbp->sb_logsunit = (lsunit == 0) ? 1 : lsunit;
+       else
+               sbp->sb_logsunit = 0;


IOWs, mkfs has always written a non-zero value into sb_logsunit, and
it's always been a value of 1 when no log stripe unit has been
specified. What's that value of 1 mean, though? It's not a multiple
of the filesystem block size - it's a hack to make v2 logs use a
stripe unit of a single sector. i.e. all the code treats a v2 log
with a sb_logsunit == 1 as though it is a v1 log.  e.g.  when
calculating stripe unit alignemnt padding of log writes in
xlog_sync():

       /* Round out the log write size */
        if (v2 && log->l_mp->m_sb.sb_logsunit > 1) {
                /* we have a v2 stripe unit to use */
                count = XLOG_LSUNITTOB(log, XLOG_BTOLSUNIT(log, count_init));
        } else {
                count = BBTOB(BTOBB(count_init));
        }

IOWs, a value of 1 effectively means "use the v1 padding limits".

So, what does XFS_IOC_FSGEOMETRY tell userspace:

        if (new_version >= 4) {
                geo->flags |=
                        (xfs_sb_version_haslogv2(&mp->m_sb) ?
                                XFS_FSOP_GEOM_FLAGS_LOGV2 : 0);
                geo->logsunit = mp->m_sb.sb_logsunit;
        }

It tell userspace whatever is in the superblock.

And what does xfs_info then output?

	 "", geo.logsectsize, geo.logsunit / geo.blocksize, lazycount,

Yeah, it assumes the number is in bytes and a multiple of block
size.  IOWs, a log stripe unit of 0 indicates basic block stripe
alignment (i.e. none) for both v1 and v2 logs without a stripe unit
configured. 

Just because xfs_info reports a value of "X" for some paramter, it
does not mean that value of "X" is a valid mkfs CLI parameter.
You can't say "no log stripe unit" on a v2 log. At best we can
define "lsu=0/lsunit=0" to mean "use the default" but then people
are going to complain that "I set it to 0 but mkfs is reporting
4096 and I can't change it".

To summaries: a log stripe value of 0 is illegal, a value of 1 means
means "behave like a v1 log", and any other value means
"use this byte count as the stripe unit".

> I'm not really convinced that a minval of 1 is correct.  The commitlog
> changes the minval but speaks only to the rationale for the maxval, so
> it's no help.

The information is all there in the git history... :/

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux