On 6/21/18 2:15 PM, Jeff Mahoney wrote: > On 6/20/18 11:57 PM, Dave Chinner wrote: >> On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@xxxxxxxx wrote: >>> From: Jeff Mahoney <jeffm@xxxxxxxx> >>> >>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the >>> AG alignment code into a separate function. It got rid of >>> redundant checks for dswidth != 0 but did too good a job since now >>> it doesn't check at all. >> >> Of course they got removed - we've already validated the CLI input >> and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is >> zero in calc_stripe_factors(). >> >> i.e. We do input validation of CLI paramters before anything else so >> that later users (like align_ag_geometry()) can assume the >> parameters they are using are valid. In this case, the assumption is >> that either both dsunit and dswidth are zero or that both are >> non-zero and dswidth an integer multple of dsunit. > > It's not coming from the CLI parameters. It's coming from the topology. > The blkid topology stuff is returning 8k for minimal i/o and 0 for > optimal. Without a CLI config, we have dunit=0 in calc_stripe_factors, > which takes it from the device. We set cfg->dsunit=16 and > cfg->dswidth=0, and then head down to align_ag_geometry. > > The topology on this system looks like: > > ft = {dsunit = 16, dswidth = 0, rtswidth = 0, lsectorsize = 512, > psectorsize = 512} > > That matches with a few of the dm targets I see reported on this system. > > Since minimal io size isn't sector size, we don't clear it. Talking > with Eric, we should probably just extent that check in > blkid_get_topology to cover that case since it's not like we can just > reject what blkid gives us. > >>> When we hit the check to see if agsize >>> is a multiple of stripe width: (cfg->agsize % cfg->dswidth), we crash >>> on a divide by zero. >> >> What CLI config did you use to hit this? I'd like to reproduce it so >> I can see where calc_stripe_factors() is going wrong.... > > It was just "mkfs.xfs <path-to-mpath-device>" yeah, so in blkid_get_topology we have: /* * If the reported values are the same as the physical sector size * do not bother to report anything. It will only cause warnings * if people specify larger stripe units or widths manually. */ if (*sunit == *psectorsize || *swidth == *psectorsize) { *sunit = 0; *swidth = 0; } as a sanity check. We should probably extend that (or add a similar test) which sets both to zero if either is zero, for the same reason as the CLI validation does it. Comments should explain why.... /* * Ensure that if either sunit or stripe width is zero, the other * is as well. Having only one set is not valid stripe geometry. */ if (*sunit == 0 || *swidth == 0) { *sunit = 0; *swidth = 0; } FWIW if I set *sunit = 8192; *swidth = 0; manually in blkid_get_topology I do get the same floating point exception; if that's what we get from libblkid for some weird device, we'd go boom. -Eric
Attachment:
signature.asc
Description: OpenPGP digital signature