Hi, On 2020/2/27 14:49, 李傲傲 (Carson Li1/9542) wrote: > Hi, > >> @@ -1482,12 +1489,6 @@ again: >> gc_seq1 = c->gc_seq; >> mutex_unlock(&c->tnc_mutex); >> >> - if (ubifs_get_wbuf(c, zbr.lnum)) { >> - /* We do not GC journal heads */ >> - err = ubifs_tnc_read_node(c, &zbr, node); >> - return err; >> - } >> - >> err = fallible_read_node(c, key, &zbr, node); >> if (err <= 0 || maybe_leb_gced(c, zbr.lnum, gc_seq1)) { > > That is what I do now. and can you please have a check at what I post before: > ------------------------------------temporary solution-------------------------------- > --- a/fs/ubifs/tnc.c > +++ b/fs/ubifs/tnc.c > @@ -1482,12 +1482,6 @@ again: > gc_seq1 = c->gc_seq; > mutex_unlock(&c->tnc_mutex); > > -if (ubifs_get_wbuf(c, zbr.lnum)) { > -/* We do not GC journal heads */ > -err = ubifs_tnc_read_node(c, &zbr, node); > -return err; > -} > - Sorry to missing that. > Actually, compared to that solution described above, I more suggest to modify the > ubifs_get_wbuf.ubifs_get_wbuf is to check if the LEB is on the jhead, but > ubifs_tnc_read_wbuf only read node from wbuf when the lnum is equal to wbuf.lnum > and the others still need to read on flash. It seems to be better to just make > ubifs_get_wbuf to check if the LEB is equal to the wbufs.lnum, and then there is > no need to have a double check in ubifs_tnc_read_wbuf. How about the following patch: Subject: [PATCH v2 2/2] ubifs: read node from wbuf when it fully sits in wbuf Carson Li Reports the following error: UBIFS error: ubifs_read_node_wbuf: expected node type 0 Not a node, first 24 bytes: Kernel panic - not syncing CPU: 1 PID: 943 Comm: http-thread 4.4.83 #1 panic+0x70/0x1e4 ubifs_dump_node+0x6c/0x9a0 ubifs_read_node_wbuf+0x350/0x384 ubifs_tnc_read_node+0x54/0x214 ubifs_tnc_locate+0x118/0x1b4 ubifs_iget+0xb8/0x68c ubifs_lookup+0x1b4/0x258 lookup_real+0x30/0x4c __lookup_hash+0x34/0x3c walk_component+0xec/0x2a0 path_lookupat+0x80/0xfc filename_lookup+0x5c/0xfc vfs_fstatat+0x4c/0x9c SyS_stat64+0x14/0x30 ret_fast_syscall+0x0/0x34 It seems the LEB used as DATA journal head is GC'ed, and ubifs_tnc_locate() read an invalid node. But now the property of journal head LEB has LPROPS_TAKEN flag set and GC will skip these LEBs. The actual situation of the problem is the LEB is GCed, freed and then reused as journal head, and finally ubifs_tnc_locate() reads an invalid node. And it can be reproduced by the following steps: * create 128 empty files * overwrite 8 files in backgroup repeatedly to trigger GC * drop inode cache and stat these 128 empty files repeatedly We can simply fix the problem by removing the optimization of reading wbuf when possible. But because taking spin lock and memcpying from wbuf is much less time-consuming than reading from MTD device, so we fix the logic of wbuf reading instead. If the node is not fully contained in write buffer, we will try to reading the remained node from MTD without any lock, and the journal head may be switched and GCed, and we will get invalid node data. So we only read from wbuf if the node fully sits in the write buffer. And we also need to check whether or not the current is LEB is GC'ed and reused as journal head. Fixes: 601c0bc46753 ("UBIFS: allow for racing between GC and TNC") Reported-and-analyzed-by: 李傲傲 (Carson Li1/9542) <Carson.Li1@xxxxxxxxxx> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> --- fs/ubifs/tnc.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 77 insertions(+), 3 deletions(-) diff --git a/fs/ubifs/tnc.c b/fs/ubifs/tnc.c index e8e7b0e9532e..82185bb68c51 100644 --- a/fs/ubifs/tnc.c +++ b/fs/ubifs/tnc.c @@ -1425,6 +1425,73 @@ static int maybe_leb_gced(struct ubifs_info *c, int lnum, int gc_seq1) return 0; } +/** + * ubifs_check_and_read_wbuf - read node from write-buffer if possible + * @c: UBIFS file-system description object + * @zbr: the zbranch describing the node to read + * @gc_seq: the saved GC sequence used for GC checking + * @buf: buffer to read to + * @retry: whether try to lookup TNC again + * + * The function checks whether the node fully sits in the write-buffer + * and whether the LEB used by write-buffer is not GCed recently, + * then it will read the node, checks it and stores in @buf. + * + * Returns 1 in case of success, 0 in case of not found, and a negative + * error code in case of failure. + * + * If the node is not in write-buffer and the LEB used by write-buffer + * may be GCed recently, @retry will be true, else false. + */ +static int ubifs_check_and_read_wbuf(struct ubifs_info *c, + const struct ubifs_zbranch *zbr, + int gc_seq, void *buf, bool *retry) +{ + bool found = false; + int lnum = zbr->lnum; + int offs = zbr->offs; + int len = zbr->len; + int type; + int i; + int err; + + *retry = false; + for (i = 0; i < c->jhead_cnt; i++) { + struct ubifs_wbuf *wbuf = &c->jheads[i].wbuf; + + /* Check whether the node is fully included in wbuf */ + spin_lock(&wbuf->lock); + if (wbuf->lnum == lnum && wbuf->offs <= offs && + offs + len <= wbuf->offs + wbuf->used) { + /* + * lnum is GC'ed and reused as journal head, + * we need to lookup TNC again. + */ + if (maybe_leb_gced(c, lnum, gc_seq)) { + spin_unlock(&wbuf->lock); + *retry = true; + break; + } + + memcpy(buf, wbuf->buf + offs - wbuf->offs, len); + spin_unlock(&wbuf->lock); + found = true; + break; + } + spin_unlock(&wbuf->lock); + } + + if (!found) + return 0; + + type = key_type(c, &zbr->key); + err = ubifs_check_node_buf(c, buf, type, len, lnum, offs); + if (err) + return err; + + return 1; +} + /** * ubifs_tnc_locate - look up a file-system node and return it and its location. * @c: UBIFS file-system description object @@ -1444,6 +1511,7 @@ int ubifs_tnc_locate(struct ubifs_info *c, const union ubifs_key *key, int found, n, err, safely = 0, gc_seq1; struct ubifs_znode *znode; struct ubifs_zbranch zbr, *zt; + bool retry; again: mutex_lock(&c->tnc_mutex); @@ -1477,10 +1545,16 @@ int ubifs_tnc_locate(struct ubifs_info *c, const union ubifs_key *key, gc_seq1 = c->gc_seq; mutex_unlock(&c->tnc_mutex); - if (ubifs_get_wbuf(c, zbr.lnum)) { - /* We do not GC journal heads */ - err = ubifs_tnc_read_node(c, &zbr, node); + err = ubifs_check_and_read_wbuf(c, &zbr, gc_seq1, node, &retry); + if (err < 0) return err; + /* find a valid node */ + if (err > 0) + return 0; + /* The node is GC'ed, so lookup it again */ + if (retry) { + safely = 1; + goto again; } err = fallible_read_node(c, key, &zbr, node); -- 2.25.0.4.g0ad7144999 > -----------------------------more suggested but not tested solution--------------------- > --- a/fs/ubifs/log.c > +++ b/fs/ubifs/log.c > @@ -70,28 +70,16 @@ struct ubifs_bud *ubifs_search_bud(struct ubifs_info *c, int lnum) > */ > struct ubifs_wbuf *ubifs_get_wbuf(struct ubifs_info *c, int lnum) { > -struct rb_node *p; > -struct ubifs_bud *bud; > int jhead; > > if (!c->jheads) > return NULL; > > -spin_lock(&c->buds_lock); > -p = c->buds.rb_node; > -while (p) { > -bud = rb_entry(p, struct ubifs_bud, rb); > -if (lnum < bud->lnum) > -p = p->rb_left; > -else if (lnum > bud->lnum) > -p = p->rb_right; > -else { > -jhead = bud->jhead; > -spin_unlock(&c->buds_lock); > +for(jhead = 0; jhead < c->jhead_cnt; jhead++){ > +if(lnum == c->jheads[jhead].wbuf.lnum) > return &c->jheads[jhead].wbuf; > -} > } > -spin_unlock(&c->buds_lock); > + > return NULL; > } > > --- a/fs/ubifs/io.c > +++ b/fs/ubifs/io.c > @@ -906,9 +906,10 @@ int ubifs_read_node_wbuf(struct ubifs_wbuf *wbuf, void *buf, int type, int len, > ubifs_assert(wbuf && lnum >= 0 && lnum < c->leb_cnt && offs >= 0); > ubifs_assert(!(offs & 7) && offs < c->leb_size); > ubifs_assert(type >= 0 && type < UBIFS_NODE_TYPES_CNT); > +ubifs_assert(wbuf->lnum == lnum); > > spin_lock(&wbuf->lock); > -overlap = (lnum == wbuf->lnum && offs + len > wbuf->offs); > +overlap = (offs + len > wbuf->offs); > if (!overlap) { > /* We may safely unlock the write-buffer and read the data */ > spin_unlock(&wbuf->lock); > > after the modification, the LEB contains node will not be GCed since even though > there is a commit, the wbuf.lnum is still on the bud rbtree as a journal head leb. > > > > Thanks. > Carson > > ________________________________ > This email (including its attachments) is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. Unauthorized use, dissemination, distribution or copying of this email or the information herein or taking any action in reliance on the contents of this email or the information herein, by anyone other than the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, is strictly prohibited. If you are not the intended recipient, please do not read, copy, use or disclose any part of this e-mail to others. Please notify the sender immediately and permanently delete this e-mail and any attachments if you received it in error. Internet communications cannot be guaranteed to be timely, secure, error-free or virus-free. The sender does not accept liability for any errors or omissions. > 本邮件及其附件具有保密性质,受法律保护不得泄露,仅发送给本邮件所指特定收件人。严禁非经授权使用、宣传、发布或复制本邮件或其内容。若非该特定收件人,请勿阅读、复制、 使用或披露本邮件的任何内容。若误收本邮件,请从系统中永久性删除本邮件及所有附件,并以回复邮件的方式即刻告知发件人。无法保证互联网通信及时、安全、无误或防毒。发件人对任何错漏均不承担责任。 > ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/