On Tue 16-10-18 11:49:45, Adrian Hunter wrote: > On 11/10/18 3:38 PM, Adrian Hunter wrote: > > On 11/10/18 2:12 PM, Jan Kara wrote: > >> On Wed 10-10-18 13:49:34, Theodore Y. Ts'o wrote: > >>> On Wed, Oct 10, 2018 at 04:43:27PM +0300, Adrian Hunter wrote: > >>>> Hi > >>>> > >>>> I have a case on a v4.14 kernel where the EXT4 journal commit disables > >>>> preemption for 30ms due to jbd2_clear_buffer_revoked_flags(). That in turn > >>>> disables preemption on other CPUs as they come to spin waiting for the same > >>>> lock. The side-effect of that is that it periodically blocks high priority > >>>> tasks from running. > >>>> > >>>> I see jbd2_clear_buffer_revoked_flags() iterating 32768 times calling > >>>> __find_get_block(). > >>>> > >>>> Is there any way to make jbd2_clear_buffer_revoked_flags() take less time, > >>>> or move its work out from under write_lock(&journal->j_state_lock)? > >>> Hmm.... I'd have to look a bit more carefully and then run some tests, > >>> but I *think* we can drop the j_state_lock at the beginning of JBD2 > >>> commit phase 1, and then grab it again right before we set > >>> commit_transaction->t_state to T_FLUSH. > >>> > >>> That should be safe because while the transaction state is T_LOCKED, > >>> we can't start any new handles, so there can't be any new blocks added > >>> to the revoke table. > >>> > >>> Can you give that a try and see whether that solves your priority > >>> inversion problem? > >> Agreed. Something like attached patch (compile-tested only)? > > > > I have been testing a patch with the unlock/lock at slightly different > > positions, and it definitely helps. The incidence of my problem drops from > > nearly every writeback, to a few an hour. I haven't had time to find out > > what is causing the remaining cases yet - it may not be related to EXT4. I > > should be able to test this patch tomorrow. > > Thanks very much for the quick response and patch! > > Tested-by: Adrian Hunter <adrian.hunter@xxxxxxxxx> Thanks. I've officially posted the patch. > With more stress I also found move_expired_inodes() > (wb_writeback->queue_io->move_expired_inodes) to take up to 16ms using > 230,000 branches while under spin lock. AFAICT we weren't hitting that in > practice so I am not following it up at this stage. Interesting. I actually have a patch simplifying that area as well sitting in some branch in my tree. So I can dust it off if you are interested. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR