Hi Richard, On 2018/12/16 6:37, Richard Weinberger wrote: > On Tue, Jul 4, 2017 at 6:23 AM Kyeong Yoo > <kyeong.yoo@xxxxxxxxxxxxxxxxxxx> wrote: >> >> GC task can deadlock in read_cache_page() because it may attempt >> to release a page that is actually allocated by another task in >> jffs2_write_begin(). >> The reason is that in jffs2_write_begin() there is a small window >> a cache page is allocated for use but not set Uptodate yet. >> >> This ends up with a deadlock between two tasks: >> 1) A task (e.g. file copy) >> - jffs2_write_begin() locks a cache page >> - jffs2_write_end() tries to lock "alloc_sem" from >> jffs2_reserve_space() <-- STUCK >> 2) GC task (jffs2_gcd_mtd3) >> - jffs2_garbage_collect_pass() locks "alloc_sem" >> - try to lock the same cache page in read_cache_page() <-- STUCK >> >> So to avoid this deadlock, hold "alloc_sem" in jffs2_write_begin() >> while reading data in a cache page. >> Acquiring alloc_sem may be too heavy, can we use it as a slow path and using the following code as the fast path ? /* * Need to make sure this page is Uptodate before locking it, * else the write may dead-lock with the GC thread which tries * to wait for unlock a not-Uptodate page. */ gfp_mask = mapping_gfp_mask(mapping); if (flags & AOP_FLAG_NOFS) gfp_mask &= ~__GFP_FS; pg = read_cache_page_gfp(mapping, index, gfp_mask); if (IS_ERR(pg)) { ret = PTR_ERR(pg); goto out_err; } lock_page(pg); /* If page is reclaimed, fall back to slow path */ if (!pg->mapping) { unlock_page(pg); put_page(pg); /* * While getting a page and reading data in, lock c->alloc_sem until * the page is Uptodate. Otherwise GC task may attempt to read the same * page in read_cache_page(), which causes a deadlock. */ mutex_lock(&c->alloc_sem); pg = grab_cache_page_write_begin(mapping, index, flags); And will the fix go into v5.0 ? Regards, Tao >> Signed-off-by: Kyeong Yoo <kyeong.yoo@xxxxxxxxxxxxxxxxxxx> >> --- >> >> Note: I'm resending this patch to linux-mtd. >> >> fs/jffs2/file.c | 40 +++++++++++++++++++++++++--------------- >> 1 file changed, 25 insertions(+), 15 deletions(-) >> >> diff --git a/fs/jffs2/file.c b/fs/jffs2/file.c >> index c12476e309c6..eb4e4d784d26 100644 >> --- a/fs/jffs2/file.c >> +++ b/fs/jffs2/file.c >> @@ -135,20 +135,15 @@ static int jffs2_write_begin(struct file *filp, struct address_space *mapping, >> struct page *pg; >> struct inode *inode = mapping->host; >> struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode); >> + struct jffs2_sb_info *c = JFFS2_SB_INFO(inode->i_sb); >> pgoff_t index = pos >> PAGE_SHIFT; >> uint32_t pageofs = index << PAGE_SHIFT; >> int ret = 0; >> >> - pg = grab_cache_page_write_begin(mapping, index, flags); >> - if (!pg) >> - return -ENOMEM; >> - *pagep = pg; >> - >> jffs2_dbg(1, "%s()\n", __func__); >> >> if (pageofs > inode->i_size) { >> /* Make new hole frag from old EOF to new page */ >> - struct jffs2_sb_info *c = JFFS2_SB_INFO(inode->i_sb); >> struct jffs2_raw_inode ri; >> struct jffs2_full_dnode *fn; >> uint32_t alloc_len; >> @@ -159,7 +154,7 @@ static int jffs2_write_begin(struct file *filp, struct address_space *mapping, >> ret = jffs2_reserve_space(c, sizeof(ri), &alloc_len, >> ALLOC_NORMAL, JFFS2_SUMMARY_INODE_SIZE); >> if (ret) >> - goto out_page; >> + goto out_err; >> >> mutex_lock(&f->sem); >> memset(&ri, 0, sizeof(ri)); >> @@ -189,7 +184,7 @@ static int jffs2_write_begin(struct file *filp, struct address_space *mapping, >> ret = PTR_ERR(fn); >> jffs2_complete_reservation(c); >> mutex_unlock(&f->sem); >> - goto out_page; >> + goto out_err; >> } >> ret = jffs2_add_full_dnode_to_inode(c, f, fn); >> if (f->metadata) { >> @@ -204,7 +199,7 @@ static int jffs2_write_begin(struct file *filp, struct address_space *mapping, >> jffs2_free_full_dnode(fn); >> jffs2_complete_reservation(c); >> mutex_unlock(&f->sem); >> - goto out_page; >> + goto out_err; >> } >> jffs2_complete_reservation(c); >> inode->i_size = pageofs; >> @@ -212,6 +207,19 @@ static int jffs2_write_begin(struct file *filp, struct address_space *mapping, >> } >> >> /* >> + * While getting a page and reading data in, lock c->alloc_sem until >> + * the page is Uptodate. Otherwise GC task may attempt to read the same >> + * page in read_cache_page(), which causes a deadlock. >> + */ >> + mutex_lock(&c->alloc_sem); >> + pg = grab_cache_page_write_begin(mapping, index, flags); >> + if (!pg) { >> + ret = -ENOMEM; >> + goto release_sem; >> + } >> + *pagep = pg; >> + >> + /* >> * Read in the page if it wasn't already present. Cannot optimize away >> * the whole page write case until jffs2_write_end can handle the >> * case of a short-copy. >> @@ -220,15 +228,17 @@ static int jffs2_write_begin(struct file *filp, struct address_space *mapping, >> mutex_lock(&f->sem); >> ret = jffs2_do_readpage_nolock(inode, pg); >> mutex_unlock(&f->sem); >> - if (ret) >> - goto out_page; >> + if (ret) { >> + unlock_page(pg); >> + put_page(pg); >> + goto release_sem; >> + } >> } >> jffs2_dbg(1, "end write_begin(). pg->flags %lx\n", pg->flags); >> - return ret; >> >> -out_page: >> - unlock_page(pg); >> - put_page(pg); >> +release_sem: >> + mutex_unlock(&c->alloc_sem); >> +out_err: >> return ret; >> } > > Kyeong, > > first of all, sorry for the massive delay! > > Right now I'm trying to get jffs2 back in shape and stumbled across your patch. > My test suite can actually trigger this deadlock and I think your > patch is likely the > right solution. I'm still reviewing and try to make very sure that it > works as expected. > > David, unless you have objections I'd carry this patch via the MTD tree. > ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/