Re: Oops when completing request on the wrong queue

Jens Axboe <axboe@xxxxxxxxx> · Tue, 23 Aug 2016 15:11:54 -0600

On 08/23/2016 02:54 PM, Gabriel Krisman Bertazi wrote:
Gabriel Krisman Bertazi <krisman@xxxxxxxxxxxxxxxxxx> writes:

Can you share what you ran to online/offline CPUs? I can't reproduce
this here.

I was using the ppc64_cpu tool, which shouldn't do nothing more than
write to sysfs.  but I just reproduced it with the script below.

Note that this is ppc64le.  I don't have a x86 in hand to attempt to
reproduce right now, but I'll look for one and see how it goes.

Hi,

Any luck on reproducing it?  We were initially reproducing with a
proprietary stress test, but I gave a try to a generated fio jobfile
associated with the SMT script I shared earlier and I could reproduce
the crash consistently in less than 10 minutes of execution.  this was
still ppc64le, though.  I couldn't get my hands on nvme on x86 yet.

Nope, I have not been able to reproduce it. How long does the CPU
offline/online actions take on ppc64? It's pretty slow on x86, which may
hide the issue. I took out the various printk's associated with bringing
a CPU off/online, as well as IRQ breaking parts, but didn't help in
reproducing it.

The job file I used, as well as the smt.sh script, in case you want to
give it a try:

jobfile: http://krisman.be/k/nvmejob.fio
smt.sh:  http://krisman.be/k/smt.sh

Still, the trigger seems to be consistently a heavy load of IO
associated with CPU addition/removal.

My workload looks similar to yours, in that it's high depth and with a
lot of jobs to keep most CPUs loaded. My bash script is different than
yours, I'll try that and see if it helps here.

--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html