On Wed, May 6, 2015 at 3:10 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Wed, May 6, 2015 at 1:04 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: >> >> The motivation for this change is persistent memory and the desire to >> use it not only via the pmem driver, but also as a memory target for I/O >> (DAX, O_DIRECT, DMA, RDMA, etc) in other parts of the kernel. > > I detest this approach. > Hmm, yes, I can't argue against "put the onus on odd behavior where it belongs."... > I'd much rather go exactly the other way around, and do the dynamic > "struct page" instead. > > Add a flag to "struct page" Ok, given I had already precluded 32-bit systems in this __pfn_t approach we should have flag space for this on 64-bit. > to mark it as a fake entry and teach > "page_to_pfn()" to look up the actual pfn some way (that union tha > contains "index" looks like a good target to also contain 'pfn', for > example). > > Especially if this is mainly for persistent storage, we'll never have > issues with worrying about writing it back under memory pressure, so > allocating a "struct page" for these things shouldn't be a problem. > There's likely only a few paths that actually generate IO for those > things. > > In other words, I'd really like our basic infrastructure to be for the > *normal* case, and the "struct page" is about so much more than just > "what's the target for IO". For normal IO, "struct page" is also what > serializes the IO so that you have a consistent view of the end > result, and there's obviously the reference count there too. So I > really *really* think that "struct page" is the better entity for > describing the actual IO, because it's the common and the generic > thing, while a "pfn" is not actually *enough* for IO in general, and > you now end up having to look up the "struct page" for the locking and > refcounting etc. > > If you go the other way, and instead generate a "struct page" from the > pfn for the few cases that need it, you put the onus on odd behavior > where it belongs. > > Yes, it might not be any simpler in the end, but I think it would be > conceptually much better. Conceptually better, but certainly more difficult to audit if the fake struct page is initialized in a subtle way that breaks when/if it leaks to some unwitting context. The one benefit I may need to concede is a mechanism to opt-in to handle these fake pages to the few paths that know what they are doing. That was easy with __pfn_t, but a struct page can go silently almost anywhere. Certainly nothing is prepared a for a given struct page pointer to change the pfn it points to on the fly, which I think is what we would end up doing for something like a raid cache. Keep a pool of struct pages around and point them at persistent memory pfns while I/O is in flight. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html