On Fri, Nov 20, 2015 at 10:35:47AM +1100, Dave Chinner wrote: > On Thu, Nov 19, 2015 at 10:55:25AM -0500, Brian Foster wrote: > > On Wed, Nov 18, 2015 at 12:46:21AM +0200, Octavian Purdila wrote: > > > Naive implementation for non-mmu architectures: allocate physically > > > contiguous xfs buffers with alloc_pages. Terribly inefficient with > > > memory and fragmentation on high I/O loads but it may be good enough > > > for basic usage (which most non-mmu architectures will need). > > > > > > This patch was tested with lklfuse [1] and basic operations seems to > > > work even with 16MB allocated for LKL. > > > > > > [1] https://github.com/lkl/linux > > > > > > Signed-off-by: Octavian Purdila <octavian.purdila@xxxxxxxxx> > > > --- > > > > Interesting, though this makes me wonder why we couldn't have a new > > _XBF_VMEM (for example) buffer type that uses vmalloc(). I'm not > > familiar with mmu-less context, but I see that mm/nommu.c has a > > __vmalloc() interface that looks like it ultimately translates into an > > alloc_pages() call. Would that accomplish what this patch is currently > > trying to do? > > vmalloc is always a last resort. vmalloc space on 32 bit systems is > extremely limited and it is easy to exhaust with XFS. > Sure, but my impression is that a vmalloc() buffer is roughly equivalent in this regard to a current !XBF_UNMAPPED && size > PAGE_SIZE buffer. We just do the allocation and mapping separately (presumably for other reasons). > Also, vmalloc limits the control we have over allocation context > (e.g. the hoops we jump through in kmem_alloc_large() to maintain > GFP_NOFS contexts), so just using vmalloc doesn't make things much > simpler from an XFS perspective. > The comment in kmem_zalloc_large() calls out some apparent hardcoded allocation flags down in the depths of vmalloc(). It looks to me that page allocation (__vmalloc_area_node()) actually uses the provided flags, so I'm not following the "data page" part of that comment. Indeed, I do see that this is not the case down in calls like pmd_alloc_one(), pte_alloc_one_kernel(), etc., associated with page table management. Those latter calls are all from following down through the map_vm_area()->vmap_page_range() codepath from __vmalloc_area_node(). We call vm_map_ram() directly from _xfs_buf_map_pages(), which itself calls down into the same code. Indeed, we already protect ourselves here via the same memalloc_noio_save() mechanism that kmem_zalloc_large() uses. I suspect there's more to it than that because it does look like vm_map_ram() has a different mechanism for managing vmalloc space for certain (smaller) allocations, either of which I'm not really familiar with. That aside, I don't see how vmalloc() introduces any new allocation context issues for those buffers where we already set up a multi-page mapping. We still have the somewhat customized page allocation code in xfs_buf_allocate_memory() to contend with. I actually think it would be useful to have a DEBUG sysfs tunable to turn on vmalloc() buffers and actually test how effective some of this code is. > > I ask because it seems like that would help clean up the code a bit, for > > one. It might also facilitate some degree of testing of the XFS bits > > (even if utilized sparingly in DEBUG mode if it weren't suitable enough > > for generic/mmu use). We currently allocate and map the buffer pages > > separately and I'm not sure if there's any particular reasons for doing > > that outside of some congestion handling in the allocation code and > > XBF_UNMAPPED buffers, the latter probably being irrelevant for nommu. > > Any other thoughts on that? > > We could probably clean the code up more (the allocation logic > is now largely a historic relic) but I'm not convinced yet that we > should be spending any time trying to specifically support mmu-less > hardware. > Fair point, we'll see where the use case discussion goes. That said, I was a little surprised that this is all that was required to enable nommu support. If that is indeed the case and we aren't in for a series of subsequent nommu specific changes (Octavian?) by letting this through, what's the big deal? This seems fairly harmless to me as is, particularly if it can be semi-tested via DEBUG mode and has potential generic use down the road. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html