On 08/14/2017 08:28 PM, Linus Torvalds wrote: > On Mon, Aug 14, 2017 at 8:15 PM, Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote: >> But what should we do when some other (non page) wait queue runs into the >> same problem? > > Hopefully the same: root-cause it. > > Once you have a test-case, it should generally be fairly simple to do > with profiles, just seeing who the caller is when ttwu() (or whatever > it is that ends up being the most noticeable part of the wakeup chain) > shows up very heavily. We have a test case but it is a customer workload. We'll try to get a bit more info. > > And I think that ends up being true whether the "break up long chains" > patch goes in or not. Even if we end up allowing interrupts in the > middle, a long wait-queue is a problem. > > I think the "break up long chains" thing may be the right thing > against actual malicious attacks, but not for any actual real > benchmark or load. This is a concern from our customer as we could trigger the watchdog timer by running user space workloads. > > I don't think we normally have cases of long wait-queues, though. At > least not the kinds that cause problems. The real (and valid) > thundering herd cases should already be using exclusive waiters that > only wake up one process at a time. > > The page bit-waiting is hopefully special. As mentioned, we used to > have some _really_ special code for it for other reasons, and I > suspect you see this problem with them because we over-simplified it > from being a per-zone dynamically sized one (where the per-zone thing > caused both performance problems and actual bugs) to being that > "static small array". > > So I think/hope that just re-introducing some dynamic sizing will help > sufficiently, and that this really is an odd and unusual case. I agree that dynamic sizing makes a lot of sense. We'll check to see if additional size to the hash table helps, assuming that the waiters are distributed among different pages for our test case. Thanks. Tim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>