On Tue, Aug 25, 2020 at 08:59:39AM -0500, Eric Sandeen wrote: > On 8/24/20 5:55 PM, Dave Chinner wrote: > > I agree that mkfs needs to be aware of DAX capability of the block > > device, but that capability existing should not cause mkfs to fail. > > If we want users to be able to direct mkfs to to create a DAX > > capable filesystem then adding a -d dax option would be a better > > idea. This would direct mkfs to align/size all the data options to > > use a DAX compatible topology if blkid supports reporting the DAX > > topology. It would also do things like turn off reflink (until that > > is supported w/ DAX), etc. > > > > i.e. if the user knows they are going to use DAX (and they will) > > then they can tell mkfs to make a DAX compatible filesystem. > > FWIW, Darrick /just/ added a -d daxinherit option, though all it does > now is set the inheritable dax flag on the root dir, it doesn't enforce > things like page vs block size, etc. > > That change is currently staged in my local tree. > > I suppose we could condition that on other requirements, although we've > always had the ability to mkfs a filesystem that can't necessarily be > used on the current machine - i.e. you can make a 64k block size filesystem > on a 4k page machine, etc. So I'm not sure we want to tie mkfs abilities > to the current mkfs environment.... > > Still, I wonder if I should hold off on "-d daxinherit" patch until we > have thought through things like reflink conflicts, for now. > > (though again, mkfs is "perfectly capapable" of making a consistent > reflink+dax filesystem, it's just that no kernel can mount it today...) No, please don't layer additional meanings onto daxinherit=1. I actually /do/ want to have a -d dax=1 option for "set up this filesystem for DAX" that will configure the geometry for that device to play nicely with the things that (some) DAX users want. IOWs, you say "-d dax=1" and that means that mkfs sniffs out the DAXiness of the underlying device and the PMD size. Then it turns off reflink by default, sets the daxinherit=1 hint, and configures the extent size and su/sw hints to match the PMD size. Or, you say "-r dax=1" for the realtime device, and now it sets the allocation unit to the PMD size for people running huge databases and want only huge pages to back their table data<cough>. Zooming out a bit, maybe we should instead introduce a new "tuning" parameter for -d and -r so that administrators could tune the filesystem for specific purposes: -d tune=dax: Reject if device not dax, set daxinherit=1, set extsize/su/sw to match PMD -d tune=ssd: Set agcount to match the number of CPUs if possible, make the log larger to support a large number of threads and iops. -d tune=rotational: Probably does nothing. ;) -d tune=auto: Query blkid to guess which of the above three profiles we should use. -d tune=none: No tuning. And then you'd do the same for the realtime device. This would help us get rid of the seeeekret mkfs wrapper that we use to make it easier for our internal customers to use DAX since mkfs.xfs doesn't support config files. --D > -Eric