On Mon, Jul 17, 2017 at 01:24:23AM +0000, Liang, Kan wrote: > Hi Don & Thomas, > > Sorry for the late response. We just finished the tests for all proposed patches. > > There are three proposed patches so far. > Patch 1: The patch as above which speed up the hrtimer. > Patch 2: Thomas's first proposal. > https://patchwork.kernel.org/patch/9803033/ > https://patchwork.kernel.org/patch/9805903/ > Patch 3: my original proposal which increase the NMI watchdog timeout by 3X > https://patchwork.kernel.org/patch/9802053/ > > According to our test, only patch 3 works well. > The other two patches will hang the system eventually. > For patch 1, the system hang after running our test case for ~1 hour. > For patch 2, the system hang in running the overnight test. > There is no error message shown when the system hang. So I don't know the > root cause yet. Hi Kan, Thanks for the feedback. Odd that the different patches had different results. What is more odd to me is the hang. I thought these were all false lockups that prematurely panic'd and rebooted the box. Is the machine configured to panic on hardlockup and reboot? Perhaps kdump is enabled to store the console log for review upon reboot? It almost implies that a hardlockup did happen but isnt' being detected until later?? > > BTW: We set 1 to watchdog_thresh when we did the test. > It's believed that can speed up the failure. Sure, you/they look for 1 second hangs instead of 10 second ones. But with patch3 it is more like 3 seconds'ish vs 30 second'ish. As Thomas asked, I would also be interested in the way the test works. The hang doesn't make sense. Cheers, Don