On 9/25/18 1:49 AM, Dave Chinner wrote: > On Mon, Sep 24, 2018 at 12:09:37PM -0600, Jens Axboe wrote: >> On 9/24/18 12:00 PM, Christopher Lameter wrote: >>> On Mon, 24 Sep 2018, Jens Axboe wrote: >>> >>>> The situation is making me a little uncomfortable, though. If we export >>>> such a setting, we really should be honoring it... > > That's what I said up front, but you replied to this with: > > | I think this is all crazy talk. We've never done this, [...] > > Now I'm not sure what you are saying we should do.... > >>> Various subsystems create custom slab arrays with their particular >>> alignment requirement for these allocations. >> >> Oh yeah, I think the solution is basic enough for XFS, for instance. >> They just have to error on the side of being cautious, by going full >> sector alignment for memory... > > How does the filesystem find out about hardware alignment > requirements? Isn't probing through the block device to find out > about the request queue configurations considered a layering > violation? Right now it isn't a stacked property, so answering the question isn't even possible beyond "what does the top device require". > What if sector alignment is not sufficient? And how would this work > if we start supporting sector sizes larger than page size? (which the > XFS buffer cache supports just fine, even if nothing else in > Linux does). If sector alignment isn't sufficient, then we'd need to bounce 512b formats... But I don't want to over-design something that isn't relevant to real life setups. I'm not aware of anything that needs memory aligned to that degree. > But even ignoring sector size > page size, implementing this > requires a bunch of new slab caches, especially for 64k page > machines because XFS supports sector sizes up to 32k. And every > other filesystem that uses sector sized buffers (e.g. HFS) would > have to do the same thing. Seems somewhat wasteful to require > everyone to implement their own aligned sector slab cache... > > Perhaps we should take the filesystem out of this completely - maybe > the block layer could provide a generic "sector heap" and have all > filesystems that use sector sized buffers allocate from it. e.g. > something like > > mem = bdev_alloc_sector_buffer(bdev, sector_size) > > That way we don't have to rely on filesystems knowing anything about > the alignment limitations of the devices or assumptions about DMA > to work correctly... I like that idea, would probably also need a mempool backing for certain cases. -- Jens Axboe