On Thu, Jun 21, 2018 at 02:26:45PM -0500, Eric Sandeen wrote: > On 6/21/18 2:15 PM, Jeff Mahoney wrote: > > On 6/20/18 11:57 PM, Dave Chinner wrote: > >> On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@xxxxxxxx wrote: > >>> From: Jeff Mahoney <jeffm@xxxxxxxx> > >>> > >>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the > >>> AG alignment code into a separate function. It got rid of > >>> redundant checks for dswidth != 0 but did too good a job since now > >>> it doesn't check at all. > >> > >> Of course they got removed - we've already validated the CLI input > >> and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is > >> zero in calc_stripe_factors(). > >> > >> i.e. We do input validation of CLI paramters before anything else so > >> that later users (like align_ag_geometry()) can assume the > >> parameters they are using are valid. In this case, the assumption is > >> that either both dsunit and dswidth are zero or that both are > >> non-zero and dswidth an integer multple of dsunit. > > > > It's not coming from the CLI parameters. It's coming from the topology. > > The blkid topology stuff is returning 8k for minimal i/o and 0 for > > optimal. Without a CLI config, we have dunit=0 in calc_stripe_factors, > > which takes it from the device. We set cfg->dsunit=16 and > > cfg->dswidth=0, and then head down to align_ag_geometry. > > > > The topology on this system looks like: > > > > ft = {dsunit = 16, dswidth = 0, rtswidth = 0, lsectorsize = 512, > > > > psectorsize = 512} > > That matches with a few of the dm targets I see reported on this system. Exactly what kind of dm device does that - we've never had anyone report this before? Also, if it's a dm device, shouldn't it also be fixed to output a sane set of IO characteristics in /sys? > > Since minimal io size isn't sector size, we don't clear it. Talking > > with Eric, we should probably just extent that check in > > blkid_get_topology to cover that case since it's not like we can just > > reject what blkid gives us. > > > >>> When we hit the check to see if agsize > >>> is a multiple of stripe width: (cfg->agsize % cfg->dswidth), we crash > >>> on a divide by zero. > >> > >> What CLI config did you use to hit this? I'd like to reproduce it so > >> I can see where calc_stripe_factors() is going wrong.... > > > > It was just "mkfs.xfs <path-to-mpath-device>" > > yeah, so in blkid_get_topology we have: > > /* > * If the reported values are the same as the physical sector size > * do not bother to report anything. It will only cause warnings > * if people specify larger stripe units or widths manually. > */ > if (*sunit == *psectorsize || *swidth == *psectorsize) { > *sunit = 0; > *swidth = 0; > } > > as a sanity check. We should probably extend that (or add a similar test) > which sets both to zero if either is zero, for the same reason as the CLI > validation does it. Comments should explain why.... > > /* > * Ensure that if either sunit or stripe width is zero, the other > * is as well. Having only one set is not valid stripe geometry. > */ > if (*sunit == 0 || *swidth == 0) { > *sunit = 0; > *swidth = 0; > } Yup, we need to validate that input properly before using it. Silly me - I should have known better that to assume external code would give back sane results..... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html