Re: [PATCH] mkfs: fix divide-by-zero in align_ag_geometry

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/21/18 2:15 PM, Jeff Mahoney wrote:
> On 6/20/18 11:57 PM, Dave Chinner wrote:
>> On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@xxxxxxxx wrote:
>>> From: Jeff Mahoney <jeffm@xxxxxxxx>
>>>
>>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
>>> AG alignment code into a separate function.  It got rid of
>>> redundant checks for dswidth != 0 but did too good a job since now
>>> it doesn't check at all.
>>
>> Of course they got removed - we've already validated the CLI input
>> and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is
>> zero in calc_stripe_factors().
>>
>> i.e. We do input validation of CLI paramters before anything else so
>> that later users (like align_ag_geometry()) can assume the
>> parameters they are using are valid. In this case, the assumption is
>> that either both dsunit and dswidth are zero or that both are
>> non-zero and dswidth an integer multple of dsunit.
> 
> It's not coming from the CLI parameters.  It's coming from the topology.
>  The blkid topology stuff is returning 8k for minimal i/o and 0 for
> optimal.  Without a CLI config, we have dunit=0 in calc_stripe_factors,
> which takes it from the device.  We set cfg->dsunit=16 and
> cfg->dswidth=0, and then head down to align_ag_geometry.
> 
> The topology on this system looks like:
> 
> ft = {dsunit = 16, dswidth = 0, rtswidth = 0, lsectorsize = 512,
> psectorsize = 512}
> 
> That matches with a few of the dm targets I see reported on this system.
> 
> Since minimal io size isn't sector size, we don't clear it.  Talking
> with Eric, we should probably just extent that check in
> blkid_get_topology to cover that case since it's not like we can just
> reject what blkid gives us.
> 
>>> When we hit the check to see if agsize
>>> is a multiple of stripe width: (cfg->agsize % cfg->dswidth), we crash
>>> on a divide by zero.
>>
>> What CLI config did you use to hit this? I'd like to reproduce it so
>> I can see where calc_stripe_factors() is going wrong....
> 
> It was just "mkfs.xfs <path-to-mpath-device>"

yeah, so in blkid_get_topology we have:

        /*
         * If the reported values are the same as the physical sector size
         * do not bother to report anything.  It will only cause warnings
         * if people specify larger stripe units or widths manually.
         */
        if (*sunit == *psectorsize || *swidth == *psectorsize) {
                *sunit = 0;
                *swidth = 0;
        }

as a sanity check.  We should probably extend that (or add a similar test)
which sets both to zero if either is zero, for the same reason as the CLI
validation does it.  Comments should explain why....

        /*
         * Ensure that if either sunit or stripe width is zero, the other
         * is as well.  Having only one set is not valid stripe geometry.
         */
        if (*sunit == 0 || *swidth == 0) {
                *sunit = 0;
                *swidth = 0;
        }

FWIW if I set 

        *sunit = 8192;
        *swidth = 0;

manually in blkid_get_topology I do get the same floating point exception;
if that's what we get from libblkid for some weird device, we'd go boom.

-Eric

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux