On Tue, Feb 02, 2016 at 01:46:06PM -0800, Jared Hulbert wrote: > On Tue, Feb 2, 2016 at 8:51 AM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > >> The filesystem I'm concerned with is AXFS > >> (https://www.kernel.org/doc/ols/2008/ols2008v1-pages-211-218.pdf). > >> Which I've been planning on trying to merge again due to a recent > >> resurgence of interest. The device model for AXFS is... weird. It > >> can use one or two devices at a time of any mix of NOR MTD, NAND MTD, > >> block, and unmanaged physical memory. It's a terribly useful model > >> for embedded. Anyway AXFS is readonly so hacking in a read only > >> dax_fault_nodev() and dax_file_read() would work fine, looks easy > >> enough. But... it would be cool if similar small embedded focused RW > >> filesystems were enabled. > > > > Are those also out of tree? > > Of course. Merging embedded filesystems is little merging regular > filesystems except 98% of you reviewers don't want it merged. You should at least be able to get it into staging these days. I mean, look at some of the junk that's in staging ... and I don't think AXFS was nearly as bad. > IMO you're making DAX more complex by overly coupling to the bdev and > I think it could bite you later. I submit this rework of the radix > tree and confusion about where to get the real bdev as evidence. I'm > guessing that it won't be the last time. It's unnecessary to couple > it like this, and in fact is not how the vfs has been layered in the > past. Huh? The rework to use the radix tree for PFNs was done with one eye firmly on your usage case. Just because I had to thread the get_block interface through it for the moment doesn't mean that I didn't have the "how do we get rid of get_block entirely" question on my mind. Using get_block seemed like the right idea three years ago. I didn't know just how fundamentally ext4 and XFS disagree on how it should be used. > To look at the the downside consider dax_fault(). Its called on a > fault to a user memory map, uses the filesystems get_block() to lookup > a sector so you can ask a block device to convert it to an address on > a DIMM. Come on, that's awkward. Everything around dax_fault() is > dripping with memory semantic interfaces, the dax_fault() call are > fundamentally about memory, the pmem calls are memory, the hardware is > memory, and yet it directly calls bdev_direct_access(). It's out of > place. What was out of place was the old 'get_xip_mem' in address_space operations. Returning a kernel virtual address and a PFN from a filesystem operation? That looks awful. All the other operations deal in struct pages, file offsets and occasionally sectors. Of course, we don't have a struct page, so a pfn makes sense, but the kernel virtual address being returned was a gargantuan layering problem. > The legacy vfs/mm code didn't have this layering problem either. Even > filemap_fault() that dax_fault() is modeled after doesn't call any > bdev methods directly, when it needs something it asks the filesystem > with a ->readpage(). The precedence is that you ask the filesystem > for what you need. Look at the get_bdev() thing you've concluded you > need. It _almost_ makes my point. I just happen to be of the opinion > that you don't actually want or need the bdev, you want the pfn/kaddr > so you can flush or map or memcpy(). You want the pfn. The device driver doesn't have enough information to give you a (coherent with userspace) kaddr. That's what (some future arch-specific implementation of) dax_map_pfn() is for. That's why it takes 'index' as a parameter, so you can calculate where it'll be mapped in userspace, and determine an appropriate kernel virtual address to use for it. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html