On Thursday 13 September 2007 09:17, Christoph Lameter wrote: > On Wed, 12 Sep 2007, Nick Piggin wrote: > > I will still argue that my approach is the better technical solution for > > large block support than yours, I don't think we made progress on that. > > And I'm quite sure we agreed at the VM summit not to rely on your patches > > for VM or IO scalability. > > The approach has already been tried (see the XFS layer) and found lacking. It is lacking because our vmap algorithms are simplistic to the point of being utterly inadequate for the new requirements. There has not been any fundamental problem revealed (like the fragmentation approach has). However fsblock can do everything that higher order pagecache can do in terms of avoiding vmap and giving contiguous memory to block devices by opportunistically allocating higher orders of pages, and falling back to vmap if they cannot be satisfied. So if you argue that vmap is a downside, then please tell me how you consider the -ENOMEM of your approach to be better? > Having a fake linear block through vmalloc means that a special software > layer must be introduced and we may face special casing in the block / fs > layer to check if we have one of these strange vmalloc blocks. I guess you're a little confused. There is nothing fake about the linear address. Filesystems of course need changes (the block layer needs none -- why would it?). But changing APIs to accommodate a better solution is what Linux is about. If, by special software layer, you mean the vmap/vunmap support in fsblock, let's see... that's probably all of a hundred or two lines. Contrast that with anti-fragmentation, lumpy reclaim, higher order pagecache and its new special mmap layer... Hmm, seems like a no brainer to me. You really still want to persue the "extra layer" argument as a point against fsblock here? > > But you just showed in two emails that you don't understand what the > > problem is. To reiterate: lumpy reclaim does *not* invalidate my > > formulae; and antifrag does *not* isolate the issue. > > I do understand what the problem is. I just do not get what your problem > with this is and why you have this drive to demand perfection. We are Oh. I don't think I could explain that if you still don't understand by now. But that's not the main issue: all that I ask is you consider fsblock on technical grounds. > working a variety of approaches on the (potential) issue but you > categorically state that it cannot be solved. Of course I wouldn't state that. On the contrary, I categorically state that I have already solved it :) > > But what do you say about viable alternatives that do not have to > > worry about these "unlikely scenarios", full stop? So, why should we > > not use fs block for higher order page support? > > Because it has already been rejected in another form and adds more You have rejected it. But they are bogus reasons, as I showed above. You also describe some other real (if lesser) issues like number of page structs to manage in the pagecache. But this is hardly enough to reject my patch now... for every downside you can point out in my approach, I can point out one in yours. - fsblock doesn't require changes to virtual memory layer - fsblock can retain cache of just 4K in a > 4K block size file How about those? I know very well how Linus feels about both of them. > Maybe we coud get to something like a hybrid that avoids some of these > issues? Add support so something like a virtual compound page can be > handled transparently in the filesystem layer with special casing if > such a beast reaches the block layer? That's conceptually much worse, IMO. And practically worse as well: vmap space is limited on 32-bit; fsblock approach can avoid vmap completely in many cases; for two reasons. The fsblock data accessor APIs aren't _that_ bad changes. They change zero conceptually in the filesystem, are arguably cleaner, and can be essentially nooped if we wanted to stay with a b_data type approach (but they give you that flexibility to replace it with any implementation). - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html