On Tue, Oct 13, 2020 at 01:13:48PM +0800, Hao_Xu wrote: > 在 2020/10/13 上午5:13, Matthew Wilcox 写道: > > This one's pretty unlikely, but there's a case in buffered reads where > > an IOCB_WAITQ read can end up sleeping. > > > > generic_file_buffered_read(): > > page = find_get_page(mapping, index); > > ... > > if (!PageUptodate(page)) { > > ... > > if (iocb->ki_flags & IOCB_WAITQ) { > > ... > > error = wait_on_page_locked_async(page, > > iocb->ki_waitq); > > wait_on_page_locked_async(): > > if (!PageLocked(page)) > > return 0; > > (back to generic_file_buffered_read): > > if (!mapping->a_ops->is_partially_uptodate(page, > > offset, iter->count)) > > goto page_not_up_to_date_locked; > > > > page_not_up_to_date_locked: > > if (iocb->ki_flags & (IOCB_NOIO | IOCB_NOWAIT)) { > > unlock_page(page); > > put_page(page); > > goto would_block; > > } > > ... > > error = mapping->a_ops->readpage(filp, page); > > (will unlock page on I/O completion) > > if (!PageUptodate(page)) { > > error = lock_page_killable(page); > > > > So if we have IOCB_WAITQ set but IOCB_NOWAIT clear, we'll call ->readpage() > > and wait for the I/O to complete. I can't quite figure out if this is > > intentional -- I think not; if I understand the semantics right, we > > should be returning -EIOCBQUEUED and punting to an I/O thread to > > kick off the I/O and wait. > > > > I think the right fix is to return -EIOCBQUEUED from > > wait_on_page_locked_async() if the page isn't locked. ie this: > > > > @@ -1258,7 +1258,7 @@ static int wait_on_page_locked_async(struct page *page, > > struct wait_page_queue *wait) > > { > > if (!PageLocked(page)) > > - return 0; > > + return -EIOCBQUEUED; > > return __wait_on_page_locked_async(compound_head(page), wait, false); > > } > > But as I said, I'm not sure what the semantics are supposed to be. > > > Hi Matthew, > which kernel version are you use, I believe I've fixed this case in the > commit c8d317aa1887b40b188ec3aaa6e9e524333caed1 Ah, I don't have that commit in my tree. Nevertheless, there is still a problem. The ->readpage implementation is not required to execute asynchronously. For example, it may enter page reclaim by using GFP_KERNEL. Indeed, I feel it is better if it works synchronously as it can then report the actual error from an I/O instead of the almost-meaningless -EIO. This patch series documents 12 filesystems which implement ->readpage in a synchronous way today (for at least some cases) and converts iomap to be synchronous (making two more filesystems synchronous). https://lore.kernel.org/linux-fsdevel/20201009143104.22673-1-willy@xxxxxxxxxxxxx/