Re: 4x lower IOPS: Linux MD vs indiv. devices - why?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The root cause behind the high cpu utilization is the IRQ load your eight
NVMe drives generate, although context switching your 2048 threads also add
a lot.

Indeed, the ctx switches and interrupts are in the millions/sec.

With engine=sync and numjobs=2048, I have

ctx_sw: 8828446
inter:  5780374

It's astonishing that this is even possible.

To cope with the unsustainable interrupt rate, you might want to give a
shot to the psync engine with RWF_HIPRI option set, which turns on polling
mode in the block layer (Jens has been very much behind it, so he's the guy
in the know of the details).

Polling avoids interrupts at the price of the somewhat inflated latency,
but reduces the cpu load noticeably, so it may turn out a good option for
your box specifically. Notice you'll need preadv2/pwrirev2 syscall support
in your kernel.

I have run an exhaustive number of 30 tests using the different engines, including pvsync2 + hipri.

Please find everything here

https://github.com/oberstet/scratchbox/blob/master/cruncher/sync-engines/README.md

and in the containing folder there.

Using pvsync2 + hipri indeed changes the picture .. but not to the better =(

The machine completely bogs down and the IOPS doesn't get higher.

Sidenote: would nice if FIO logged the total CPU and interrupt rates ..

Here is a screenshot while running pvsync2+hipri

http://picpaste.com/pics/Bildschirmfoto_vom_2017-01-23_23-52-10-55NJYHu2.1485215076.png

--

My current preliminary conclusions on this box / workload:

- running psync is much better than sync
- all engines "above" psync only bring minor perf. gains
- Linux MD (pure striping, RAID-0) comes with rougly 45% overhead
- saturing the storage subsystem consumes nearly all CPU

Cheers,
/Tobias

PS: I have a small time window left (days) until this box goes into further setup for production (which means, I cannot scratch the storage anymore) - if you have anything you want me to try, let me know. I do my best to get it tested. The hardware is probably not mainstream ..


--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux