We have some reports of the interrupt hangcheck watchdog firing when completing I/Os that are millions of pages long. While it doesn't really make sense to construct single I/Os that are that big, it does flag that even, say, a 16MB writeback I/O is going to spend a lot of time clearing the writeback bit from pages. With 4kB pages, that's 4096 pages. At 2000 cycles per cache miss (and with just the struct pages being 256kB of data, they're not in L1 cache any more and probably not in L2 either), that's 8 million cycles with interrupts disabled. If we could guarantee that BIOs were always ended in softirq context, the page cache could use spin_lock_bh() instead of spin_lock_irq(). I haven't done any measurements to quantify what kind of improvement that would be, but I suspect it could be seen on some workloads. (this is more of an attempt to induce conference driven development than necessarily a discussion topic for lsfmm)