> -----Original Message----- > From: fio-owner@xxxxxxxxxxxxxxx [mailto:fio-owner@xxxxxxxxxxxxxxx] On > Behalf Of Tobias Oberstein > Sent: Tuesday, January 24, 2017 4:52 PM > To: Andrey Kuzmin <andrey.v.kuzmin@xxxxxxxxx> > Cc: fio@xxxxxxxxxxxxxxx; Jens Axboe <axboe@xxxxxxxxx> > Subject: Re: 4x lower IOPS: Linux MD vs indiv. devices - why? > > However, during my tests, I get this in kernel log: > > [459346.155564] NMI watchdog: BUG: soft lockup - CPU#46 stuck for > 22s! > [swapper/46:0] > [461040.530959] NMI watchdog: BUG: soft lockup - CPU#26 stuck for > 22s! > [swapper/26:0] > [461044.279081] NMI watchdog: BUG: soft lockup - CPU#23 stuck for > 22s! > [swapper/23:0] > > I wild guess: these lockups are actually deadlocks. AIO seems to be > tricky for the kernel too. > Probably not deadlocks. One easy to way trigger those is to submit IOs on one set of CPUs and expect a different set of CPUs to handle the interrupts and completions. The latter CPUs can easily become overwhelmed. The best remedy I've found is to require CPUs to handle their own IOs, which self-throttles them from submitting more IOs than they can handle. The storage device driver needs to set up its hardware interrupts that way. Then, rq_affinity=2 ensures the block layer completions are handled on the submitting CPU. You can add this to the kernel command line (e.g., in /boot/grub/grub.conf) to squelch those checks: nosoftlockup Those prints themselves can induce more soft lockups if you have a live serial port, because printing to the serial port is slow and blocking. ��.n��������+%������w��{.n�������^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�