On Tue, Sep 13, 2016 at 8:39 PM, Nicholas Piggin <npiggin@xxxxxxxxx> wrote: > > But even for those, at 16 entries, the bulk of the cost *should* be hitting > struct page cachelines and refcounting. The rest should mostly stay in cache. Yes. And those costs will be exactly the same whether we do 16 entries at a time or 4 loops of 4 entries. There's something to be said for small temp buffers. They often have better cache behavior thanks to re-use than having larger arrays. But I still think that the biggest win could be from just trying to cut down on code, if we can just say "we'll limit splice to N entries" (where "N" is small enough that we really can do everything in a simple stack allocation - I suspect 16 is already too big, and we really should look at 4 or 8). And if we actually get a report of a performance regression, we'd at least hear who actually *uses* splice and notices. I'm (sadly) still not at all convinced that "splice()" was ever a good idea. I think it was a clever idea, and it is definitely much more powerful conceptually than sendfile(), but I also suspect that it's simply not used enough to be really worth the pain. You can get great benchmark numbers with it. But whether it actually matters in real life? I really don't know. But if we screw it up, and make the buffers too small, and people actually complain and tell us about what they are doing, that in itself would be a good datapoint. So I wouldn't be too worried about just trying things out. We certainly don't want to *break* anything, but at the same time I really don't think we should be too nervous about it either. Which is why I'd be more than happy to say "Just try limiting things to a pretty small buffer and see if anybody even notices!" Linus -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html