On Thu, 23 Oct 2014, Chris Friesen wrote: > On 10/17/2014 12:55 PM, Austin Schuh wrote: > > Use the 121 patch. This sounds very similar to the issue that I helped > > debug with XFS. There ended up being a deadlock due to a bug in the > > kernel work queues. You can search the RT archives for more info. > > I can confirm that the problem still shows up with the rt121 patch. (And > also with Paul Gortmaker's proposed 3.4.103-rt127 patch.) > We added some instrumentation and it looks like we've tracked down the problem. > Figuring out how to fix it is proving to be tricky. > > Basically it looks like we have a circular dependency involving the > inode->i_data_sem rt_mutex, the PG_writeback bit, and the BJ_Shadow list. It > goes something like this: > > jbd2_journal_commit_transaction: > 1) set page for writeback (set PG_writeback bit) > 2) put jbd2 journal head on BJ_Shadow list > 3) sleep on PG_writeback bit waiting for page writeback complete > > ext4_da_writepages: > 1) ext4_map_blocks() acquires inode->i_data_sem for writing > 2) do_get_write_access() sleeps waiting for jbd2 journal head to come off > the BJ_Shadow list > > At this point the flush code can't run because it can't acquire > inode->i_data_sem for reading, so the page will never get written out. > Deadlock. Sorry, I really cannot map that sparse description to any code flow. Proper callchains for the involved parts might help to actually understand what you are looking for. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html