Hello, These patches introduce a mechanism for limiting deep CPU idle states during block IO. With certain workloads, it is possible for CPU to enter deep idle while waiting for the IO completion, causing a large latency to the completion interrupt. See example below, where I used an Intel Icelake Xeon system to run a simple 'fio' test with random reads, and with CPU C6 state disabled / enabled (results from 2 * 2min runs): C6 enabled: slat (nsec): min=1769, max=73247, avg=6960.96, stdev=2115.90 clat (nsec): min=442, max=242706, avg=23767.06, stdev=13348.74 lat (usec): min=12, max=250, avg=30.73, stdev=13.96 slat (nsec): min=1849, max=58824, avg=6970.61, stdev=2134.38 clat (nsec): min=1684, max=241880, avg=23545.68, stdev=13448.87 lat (usec): min=12, max=249, avg=30.52, stdev=14.03 C6 disabled: slat (nsec): min=2110, max=57871, avg=6867.86, stdev=1711.55 clat (nsec): min=486, max=98292, avg=22185.50, stdev=10473.34 lat (usec): min=13, max=105, avg=29.05, stdev=10.99 slat (nsec): min=2128, max=67730, avg=6913.52, stdev=1714.89 clat (nsec): min=552, max=93409, avg=22582.50, stdev=10407.53 lat (usec): min=13, max=108, avg=29.50, stdev=10.93 The maximum latency with C6 enabled is about 2.5x seen with C6 disabled. Now, the patches provided here introduce a mechanism for the block layer to limit the maximum CPU latencies, with user configurable sysfs knobs per block device. Doing following config in my test system: /sys/block/nvme0n1/cpu_lat_limit_us = 10 /sys/block/nvme0n1/cpu_lat_timeout_ms = 3 This limits the maximum CPU latency for the active CPUs doing block IO to 10us, and the limit is removed if there is no block IO for 3ms. Running the same fio test used above with C6 enabled, I get: slat (nsec): min=1887, max=71037, avg=7239.68, stdev=1850.67 clat (nsec): min=438, max=103628, avg=22488.75, stdev=10457.86 lat (usec): min=12, max=133, avg=29.73, stdev=11.04 slat (nsec): min=1942, max=69159, avg=7194.01, stdev=1788.63 clat (nsec): min=418, max=115739, avg=22239.51, stdev=10448.37 lat (usec): min=12, max=123, avg=29.43, stdev=10.96 ... so the maximum latencies are cut by approx 100us and are quite close to the levels seen with C6 disabled completely system wide. Any thoughts about the patches and the approach taken? -Tero