Re: [-mm PATCH v2 00/25] get_user_pages() for dax pte and pmd mappings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 10, 2015 at 11:20 AM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
> Dan Williams <dan.j.williams@xxxxxxxxx> writes:
>
>> On Thu, Dec 10, 2015 at 10:08 AM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
>>> Dan Williams <dan.j.williams@xxxxxxxxx> writes:
>>>
>>>> Summary:
>>>>
>>>> To date, we have implemented two I/O usage models for persistent memory,
>>>> PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
>>>> userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
>>>> to be the target of direct-i/o.  It allows userspace to coordinate
>>>> DMA/RDMA from/to persistent memory.
>>>>
>>>> The implementation leverages the ZONE_DEVICE mm-zone that went into
>>>> 4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
>>>> and dynamically mapped by a device driver.  The pmem driver, after
>>>> mapping a persistent memory range into the system memmap via
>>>> devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
>>>> page-backed pmem-pfns via flags in the new pfn_t type.
>>>
>>> So, this basically means that an admin has to decide whether or not DMA
>>> will be used on a given device before making a file system on it.  That
>>> seems like an odd requirement.  There's also a configuration option of
>>> whether to put those backing struct pages into DRAM or PMEM (which, of
>>> course, will be dictated by the size of pmem).  I really think we should
>>> reconsider this approach.
>>>
>>> First, the admin shouldn't have to choose whether or not DMA will be
>>> done on the file system.
>>
>> To be clear it's not "whether or not DMA will be done on the file
>> system", it's whether or not both DMA and DAX will be done
>> simultaneously on the filesystem.
>
> Fair point, but I'd view one of those configurations as not recommended.
> To be clear, if you're just going to use the device for block based
> access, using btt is the safer option.

Speaking of btt, the mechanism for setting up a btt is identical to
specifying a reserved area for the memmap.  I.e. write an info block
to the namespace to specify a new mode of operation.

>> DAX is already a capability that an admin can inadvertently disable by
>> mis-configuring the alignment of a partition [1].
>
> Heh, using my own commit against me? ;-) Anyway, the commit message
> suggests that dax *could* be supported on misaligned partitions.

All's fair in love, war, and code defense. :-)

>> Why not also disable it when DMA support is not configured and force
>> the fs back to page-cache?  Namespace creation tooling in userspace
>> can default to enabling DAX + DMA.
>
> Well, the only reason I can come up with is manufactured:  we've forced
> the admin to decide between having that extra space for storage and
> doing DMA, and he or she opted for more space.

Is this any worse than the "forcing" we're imposing in the btt /
no-btt decision that impacts DAX?  This additional configuration
flexibility for whether / where to store a memmap array is merely
incremental, not fatal.  It's also a configuration decision we can
stop asking an admin to make when / if we ever re-write the kernel to
reduce its dependency on struct page.

In the meantime, I expect some would say DAX is a toy as long as it
continues to fail at DMA.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]