On Mon, Jun 22, 2020 at 10:20:40AM -0500, Eric W. Biederman wrote: > Junxiao Bi <junxiao.bi@xxxxxxxxxx> writes: > > On 6/20/20 9:27 AM, Matthew Wilcox wrote: > >> On Fri, Jun 19, 2020 at 05:42:45PM -0500, Eric W. Biederman wrote: > >>> Junxiao Bi <junxiao.bi@xxxxxxxxxx> writes: > >>>> Still high lock contention. Collect the following hot path. > >>> A different location this time. > >>> > >>> I know of at least exit_signal and exit_notify that take thread wide > >>> locks, and it looks like exit_mm is another. Those don't use the same > >>> locks as flushing proc. > >>> > >>> > >>> So I think you are simply seeing a result of the thundering herd of > >>> threads shutting down at once. Given that thread shutdown is fundamentally > >>> a slow path there is only so much that can be done. > >>> > >>> If you are up for a project to working through this thundering herd I > >>> expect I can help some. It will be a long process of cleaning up > >>> the entire thread exit process with an eye to performance. > >> Wengang had some tests which produced wall-clock values for this problem, > >> which I agree is more informative. > >> > >> I'm not entirely sure what the customer workload is that requires a > >> highly threaded workload to also shut down quickly. To my mind, an > >> overall workload is normally composed of highly-threaded tasks that run > >> for a long time and only shut down rarely (thus performance of shutdown > >> is not important) and single-threaded tasks that run for a short time. > > > > The real workload is a Java application working in server-agent mode, issue > > happened in agent side, all it do is waiting works dispatching from server and > > execute. To execute one work, agent will start lots of short live threads, there > > could be a lot of threads exit same time if there were a lots of work to > > execute, the contention on the exit path caused a high %sys time which impacted > > other workload. > > If I understand correctly, the Java VM is not exiting. Just some of > it's threads. > > That is a very different problem to deal with. That are many > optimizations that are possible when _all_ of the threads are exiting > that are not possible when _many_ threads are exiting. Ah! Now I get it. This explains why the dput() lock contention was so important. A new thread starting would block on that lock as it tried to create its new /proc/$pid/task/ directory. Terminating thousands of threads but not the entire process isn't going to hit many of the locks (eg exit_signal() and exit_mm() aren't going to be called). So we need a more sophisticated micro benchmark that is continually starting threads and asking dozens-to-thousands of them to stop at the same time. Otherwise we'll try to fix lots of scalability problems that our customer doesn't care about. > Do you know if it is simply the cpu time or if it is the lock contention > that is the problem? If it is simply the cpu time we should consider if > some of the locks that can be highly contended should become mutexes. > Or perhaps something like Matthew's cpu pinning idea. If we're not trying to optimise for the entire process going down, then we definitely don't want my CPU pinning idea.