On Mon, 2022-08-22 at 10:16 +0200, Igor Raits wrote: > [You don't often get email from igor@xxxxxxxxxxxx. Learn why this is > important at https://aka.ms/LearnAboutSenderIdentification ;] > > Hello everyone, > > Hopefully I'm sending this to the right place… > We recently started to see the following stacktrace quite often on > our > VMs that are using NFS extensively (I think after upgrading to > 5.18.11+, but not sure when exactly. For sure it happens on 5.18.15): > > INFO: task kworker/u36:10:377691 blocked for more than 122 seconds. > Tainted: G E 5.18.15-1.gdc.el8.x86_64 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > task:kworker/u36:10 state:D stack: 0 pid:377691 ppid: 2 > flags:0x00004000 > Workqueue: writeback wb_workfn (flush-0:308) > Call Trace: > <TASK> > __schedule+0x38c/0x7d0 > schedule+0x41/0xb0 > io_schedule+0x12/0x40 > __folio_lock+0x110/0x260 > ? filemap_alloc_folio+0x90/0x90 > write_cache_pages+0x1e3/0x4d0 > ? nfs_writepage_locked+0x1d0/0x1d0 [nfs] > nfs_writepages+0xe1/0x200 [nfs] > do_writepages+0xd2/0x1b0 > ? check_preempt_curr+0x47/0x70 > ? ttwu_do_wakeup+0x17/0x180 > __writeback_single_inode+0x41/0x360 > writeback_sb_inodes+0x1f0/0x460 > __writeback_inodes_wb+0x5f/0xd0 > wb_writeback+0x235/0x2d0 > wb_workfn+0x348/0x4a0 > ? put_prev_task_fair+0x1b/0x30 > ? pick_next_task+0x84/0x940 > ? __update_idle_core+0x1b/0xb0 > process_one_work+0x1c5/0x390 > worker_thread+0x30/0x360 > ? process_one_work+0x390/0x390 > kthread+0xd7/0x100 > ? kthread_complete_and_exit+0x20/0x20 > ret_from_fork+0x1f/0x30 > </TASK> > > I see that something very similar was fixed in btrfs > ( > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commi > t/?h=linux-5.18.y&id=9535ec371d741fa037e37eddc0a5b25ba82d0027) > but I could not find anything similar for NFS. > > Do you happen to know if this is already fixed? If so, would you mind > sharing some commits? If not, could you help getting this addressed? > The stack trace you show above isn't particularly helpful for diagnosing what the problem is. All it is saying is that 'thread A' is waiting to take a page lock that is being held by a different 'thread B'. Without information on what 'thread B' is doing, and why it isn't releasing the lock, there is nothing we can conclude. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx