From: Jane Chu <jane.chu@xxxxxxxxxx> Date: Tue, 11 Jul 2017 12:00:54 -0600 BTW, for sparc64 specific changes, please use the "sparc64: " subsystem prefix in your Subject lines. I've fixed it up for you this time. > But a busy system is not a broken system. In the above scenario, as long > as the receiver is making forward progress processing mondo interrupts, > the sender should continue to retry. So I'm going to apply this patch, but I absolutely, fundamentally disagree with this statement. Making forward progress only processing mondo interrupts _IS_ broken. A cpu stuck doing nothing but processing mondo interrupts is in an error state. I repeat, it is not valid for a cpu to be stuck doing mondo interrupt processing. This is true, even if it is making "forward progress" within that backlog of mondo interrupts. A cpu must always, somehow, continually make forward progress in it's primary instruction stream. In the kernel, for example, when we have so many software interrupts that the cpu is not making forward progress on anything else, we defer the software interrupt processing to a kernel thread instead of doing it immediately. This is absolutely required, so that the primary exectuion stream of the cpu always makes forward progress. We must do something similar here with MONDOs. Either we find a way to decrease the cost of the individual mondos (and this makes sense, mondos should be something that executes in an extremely small, finite, amount of time) so that these backlogs can't happen in the first place. Or, we make some kind of deferral mechanism for the most expensive kinds of mondos. I'm still pretty sure that unmaps are taking an unreasonable amount of time to execute. Our current range flush implementation is incredibly stupid, and could be improved by orders of magnitude. It allocates an entire kernel stack frame, just so that it can call __flush_tlb_pending(). In fact, we can end up doing this full trap entry/exit just for purging 2 or 3 pages. So this means we need an in-assembler cross-call trap handler that can do the TLB pending flush directly. And, we also need a limiter that says "if the number of pages pending to TLB purge is greater than X, do an MM context TLB flush instead". X should probably be something on the order of the number of entries in the hardware TLB CAM. To me this all is a huge red flag, and probably causes all of the mondo timesouts you've seen except for the PCI-E hotplug cases. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html