This one has been hard for me to bisect. Partly because it just takes so long to reboot the servers that have the problem, but also because there may be more than one thing going on. One thing that seems pretty certain is that things are OK prior to: commit d83763f4a6adb2f417c3288ee903982985ae949c Merge: 9aa3d651a919 0a5149ba02bd Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Date: Fri Nov 13 20:35:54 2015 -0800 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Attached is a serial log from a kernel built from a commit within the merge that shows the problem. Subsequent changes haven't fixed this. 4.4-rc8 hangs at the same point. commit e0bd0874f2de21613e572669b2de1e4b0c3a97de Author: sumit.saxena@xxxxxxxxxxxxx <sumit.saxena@xxxxxxxxxxxxx> Date: Mon Aug 31 17:23:01 2015 +0530 megaraid_sas: Increase timeout to 60 secs for abort frames during shutdown Stuff is obviously going wrong by this point: [ 16.100642] mpt2sas 0000:01:00.0: swiotlb buffer is full (sz: 398336 bytes) [ 16.100662] swiotlb: coherent allocation failed for device 0000:01:00.0 size=398336 followed by a stack dump (see attachment) I've put "hangs" in quotes because the kernel isn't stuck ... at the end of the attached serial log you see that the random: nonblocking pool completed initialization 46 seconds after other boot messages stopped. So I think some application just failed to complete some I/O -Tony
Attachment:
failmptsas.log
Description: failmptsas.log