On Tue 23-04-19 05:41:48, Matthew Wilcox wrote: > On Tue, Apr 23, 2019 at 12:47:07PM +0200, Michal Hocko wrote: > > On Mon 22-04-19 14:29:16, Michel Lespinasse wrote: > > [...] > > > I want to add a note about mmap_sem. In the past there has been > > > discussions about replacing it with an interval lock, but these never > > > went anywhere because, mostly, of the fact that such mechanisms were > > > too expensive to use in the page fault path. I think adding the spf > > > mechanism would invite us to revisit this issue - interval locks may > > > be a great way to avoid blocking between unrelated mmap_sem writers > > > (for example, do not delay stack creation for new threads while a > > > large mmap or munmap may be going on), and probably also to handle > > > mmap_sem readers that can't easily use the spf mechanism (for example, > > > gup callers which make use of the returned vmas). But again that is a > > > separate topic to explore which doesn't have to get resolved before > > > spf goes in. > > > > Well, I believe we should _really_ re-evaluate the range locking sooner > > rather than later. Why? Because it looks like the most straightforward > > approach to the mmap_sem contention for most usecases I have heard of > > (mostly a mm{unm}ap, mremap standing in the way of page faults). > > On a plus side it also makes us think about the current mmap (ab)users > > which should lead to an overall code improvements and maintainability. > > Dave Chinner recently did evaluate the range lock for solving a problem > in XFS and didn't like what he saw: > > https://lore.kernel.org/linux-fsdevel/20190418031013.GX29573@xxxxxxxxxxxxxxxxxxx/T/#md981b32c12a2557a2dd0f79ad41d6c8df1f6f27c Thank you, will have a look. > I think scaling the lock needs to be tied to the actual data structure > and not have a second tree on-the-side to fake-scale the locking. Anyway, > we're going to have a session on this at LSFMM, right? I thought we had something for the mmap_sem scaling but I do not see this in a list of proposed topics. But we can certainly add it there. > > SPF sounds like a good idea but it is a really big and intrusive surgery > > to the #PF path. And more importantly without any real world usecase > > numbers which would justify this. That being said I am not opposed to > > this change I just think it is a large hammer while we haven't seen > > attempts to tackle problems in a simpler way. > > I don't think the "no real world usecase numbers" is fair. Laurent quoted: > > > Ebizzy: > > ------- > > The test is counting the number of records per second it can manage, the > > higher is the best. I run it like this 'ebizzy -mTt <nrcpus>'. To get > > consistent result I repeated the test 100 times and measure the average > > result. The number is the record processes per second, the higher is the best. > > > > BASE SPF delta > > 24 CPUs x86 5492.69 9383.07 70.83% > > 1024 CPUS P8 VM 8476.74 17144.38 102% > > and cited 30% improvement for you-know-what product from an earlier > version of the patch. Well, we are talking about 45 files changed, 1277 insertions(+), 196 deletions(-) which is a _major_ surgery in my book. Having a real life workloads numbers is nothing unfair to ask for IMHO. And let me remind you that I am not really opposing SPF in general. I would just like to see a simpler approach before we go such a large change. If the range locking is not really a scalable approach then all right but from why I've see it should help a lot of most bottle-necks I have seen. -- Michal Hocko SUSE Labs