On Tue, Jan 14, 2020 at 12:39:00PM -0800, Dan Williams wrote: > On Tue, Jan 14, 2020 at 12:31 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > > > On Thu, Jan 09, 2020 at 12:03:01PM -0800, Dan Williams wrote: > > > On Thu, Jan 9, 2020 at 3:27 AM Jan Kara <jack@xxxxxxx> wrote: > > > > > > > > On Tue 07-01-20 10:49:55, Dan Williams wrote: > > > > > On Tue, Jan 7, 2020 at 10:33 AM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > > > > > W.r.t partitioning, bdev_dax_pgoff() seems to be the pain point where > > > > > > dax code refers back to block device to figure out partition offset in > > > > > > dax device. If we create a dax object corresponding to "struct block_device" > > > > > > and store sector offset in that, then we could pass that object to dax > > > > > > code and not worry about referring back to bdev. I have written some > > > > > > proof of concept code and called that object "dax_handle". I can post > > > > > > that code if there is interest. > > > > > > > > > > I don't think it's worth it in the end especially considering > > > > > filesystems are looking to operate on /dev/dax devices directly and > > > > > remove block entanglements entirely. > > > > > > > > > > > IMHO, it feels useful to be able to partition and use a dax capable > > > > > > block device in same way as non-dax block device. It will be really > > > > > > odd to think that if filesystem is on /dev/pmem0p1, then dax can't > > > > > > be enabled but if filesystem is on /dev/mapper/pmem0p1, then dax > > > > > > will work. > > > > > > > > > > That can already happen today. If you do not properly align the > > > > > partition then dax operations will be disabled. This proposal just > > > > > extends that existing failure domain to make all partitions fail to > > > > > support dax. > > > > > > > > Well, I have some sympathy with the sysadmin that has /dev/pmem0 device, > > > > decides to create partitions on it for whatever (possibly misguided) > > > > reason and then ponders why the hell DAX is not working? And PAGE_SIZE > > > > partition alignment is so obvious and widespread that I don't count it as a > > > > realistic error case sysadmins would be pondering about currently. > > > > > > > > So I'd find two options reasonably consistent: > > > > 1) Keep status quo where partitions are created and support DAX. > > > > 2) Stop partition creation altogether, if anyones wants to split pmem > > > > device further, he can use dm-linear for that (i.e., kpartx). > > > > > > > > But I'm not sure if the ship hasn't already sailed for option 2) to be > > > > feasible without angry users and Linus reverting the change. > > > > > > Christoph? I feel myself leaning more and more to the "keep pmem > > > partitions" camp. > > > > > > I don't see "drop partition support" effort ending well given the long > > > standing "ext4 fails to mount when dax is not available" precedent. > > > > > > I think the next least bad option is to have a dax_get_by_host() > > > variant that passes an offset and length pair rather than requiring a > > > later bdev_dax_pgoff() to recall the offset. This also prevents > > > needing to add another dax-device object representation. > > > > I am wondering what's the conclusion on this. I want to this to make > > progress in some direction so that I can make progress on virtiofs DAX > > support. > > I think we should at least try to delete the partition support and see > if anyone screams. Have a module option to revert the behavior so > people are not stuck waiting for the revert to land, but if it stays > quiet then we're in a better place with that support pushed out of the > dax core. Hi Dan, So basically keep partition support code just that disable it by default and it is enabled by some knob say kernel command line option/module option. At what point of time will we remove that code completely. I mean what if people scream after two kernel releases, after we have removed the code. Also, from distribution's perspective, we might not hear from our customers for a very long time (till we backport that code in to existing releases or release this new code in next major release). From that view point I will not like to break existing user visible behavior. How bad it is to keep partition support around. To me it feels reasonaly simple where we just have to store offset into dax device into another dax object and pass that object around (instead of dax_device). If that's the case, I am not sure why to even venture into a direction where some user's setup might be broken. Also from an application perspective, /dev/pmem is a block device, so it should behave like a block device, (including kernel partition table support). >From that view, dax looks like just an additional feature of that device which can be enabled by passing option "-o dax". IOW, can we reconsider the idea of not supporting kernel partition tables for dax capable block devices. I can only see downsides of removing kernel partition table support and only upside seems to be little cleanup of dax core code. Thanks Vivek