On Wed, Sep 12, 2018 at 11:07:17AM -0400, Theodore Y. Ts'o wrote: > On Wed, Sep 12, 2018 at 02:11:30PM +0200, Jan Kara wrote: > > > > Yes, I guess you're speaking about the one Chris Mason mentioned [1]. > > Essentially it's a priority inversion where jbd2 thread gets blocked behind > > writeback done on behalf of a heavily restricted process. It actually is > > not related to dirty throttling or anything like that. And the solution for > > this priority inversion is to use unwritten extents for writeback > > unconditionally as I wrote in that thread. The core of this is implemented > > and hidden behind dioread_nolock mount option but it needs some serious > > polishing work and testing... > > > > [1] https://marc.info/?l=linux-fsdevel&m=151688776319077 > > I've actually be considering making dioread_nolock the default when > page_size == block_size. > > Arguments in favor: > > 1) Improves AIO latency in some circumstances > 2) Improves parallel DIO read performance > 3) Should address the block-cg throttling priority inversion problem > > Arguments against: > > 1) Hasn't seen much usage outside of Google (where it makes a big > difference for fast flash workloads; see (1) and (2) above) > 2) Dioread_nolock only works when page_size == block_size; so this > implies we would be using a different codepath depending on > the block size. > 3) generic/500 (dm-thin ENOSPC hitter with concurrent discards) > fails with dioread_nolock, but not in the 4k workload > > Liu, can you try out mount -o dioread_nolock and see if this address > your problem, if so, maybe this is the development cycle where we > finally change the default. > I've confirmed that "mount -o dioread_nolock" fixed the hang, I can do further testing (maybe in production environment) if that's needed. Many thanks to both you and Jan. thanks, -liubo