Hi Liao, liaoweixiong <liaoweixiong@xxxxxxxxxxxxxxxxx> wrote on Fri, 7 Feb 2020 12:13:08 +0800: > hi Miquel Raynal, > > On 2020/2/6 PM 11:45, Miquel Raynal wrote: > > Hi liao, > > > > liaoweixiong <liaoweixiong@xxxxxxxxxxxxxxxxx> wrote on Thu, 6 Feb 2020 > > 21:10:47 +0800: > > > >> hi Miquel Raynal, > >> > >> On 2020/1/23 AM 1:41, Miquel Raynal wrote: > >>> Hello, > >>> > >>> >>>>>>>> +/* > >>>>>>>> + * All zones will be read as pstore/blk will read zone one by one when do > >>>>>>>> + * recover. > >>>>>>>> + */ > >>>>>>>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off) > >>>>>>>> +{ > >>>>>>>> + struct mtdpstore_context *cxt = &oops_cxt; > >>>>>>>> + size_t retlen; > >>>>>>>> + int ret; > >>>>>>>> + > >>>>>>>> + if (mtdpstore_block_isbad(cxt, off)) > >>>>>>>> + return -ENEXT; > >>>>>>>> + > >>>>>>>> + pr_debug("try to read off 0x%llx size %zu\n", off, size); > >>>>>>>> + ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf); > >>>>>>>> + if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen) { > >>>>>>> > >>>>>>> IIRC size != retlen does not mean it failed, but that you should > >>>>>>> continue reading after retlen bytes, no? > >>>>>>> >> > >>>>>> Yes, you are right. I will fix it. Thanks. > >>>>>> >>>>> Also, mtd_is_bitflip() does not mean that you are reading a false > >>>>>>> buffer, but that the data has been corrected as it contained bitflips. > >>>>>>> mtd_is_eccerr() however, would be meaningful. > >>>>>>> >> > >>>>>> Sure I know mtd_is_bitflip() does not mean failure, but I do not think > >>>>>> mtd_is_eccerr() should be here since the codes are ret < 0 and NOT > >>>>>> mtd_is_bitflip(). > >>>>> > >>>>> Yes, just drop this check, only keep ret < 0. > >>>>> >> > >>>> If I don't get it wrong, it should not be dropped here. Like your words, > >>>> "mtd_is_bitflip() does not mean that you are reading a false buffer, > >>>> but that the data has been corrected as it contained bitflips.", the > >>>> data I get are valid even if mtd_is_bitflip() return true. It's correct > >>>> data and it's no need to go to handle error. To me, the codes > >>>> should be: > >>>> if (ret < 0 && !mit_is_bitflip()) > >>>> [error handling] > >>> > >>> Please check the implementation of mtd_is_bitflip(). You'll probably > >>> figure out what I am saying. > >>> > >>> https://elixir.bootlin.com/linux/latest/source/include/linux/mtd/mtd.h#L585 > >>> >> > >> How about the codes as follows: > >> > >> for (done = 0, retlen = 0; done < size; done += retlen) { > >> ret = mtd_read(..., &retlen, ...); > >> if (!ret) > >> continue; > >> /* > >> * do nothing if bitflip and ecc error occurs because whether > >> * it's bitflip or ECC error, just a small number of bits flip > >> * and the impact on log data is so small. The mtdpstore just > >> * hands over what it gets and user can judge whether the data > >> * is valid or not. > >> */ > >> if (mtd_is_bitflip(ret)) { > >> dev_warn("bitflip at...."); > >> continue; > > > I don't understand why do you check for bitflips. Bitflips have been > > corrected at this stage, you just get the information that there > > has been bitflips, but the data integrity is fine. > > > > Both of bitflip and eccerror are not real wrong in this > case. So we must check them. > > > I am not against ignoring ECC errors in this case though. I would > > propose: > > > > for (...) { > > if (ret < 0) { > > complain; > > return; > > } > > > > -117 (-EUCLEAN) means bitflip but be corrected. > -74 (-EBADMSG) means ecc error that uncorrectable > All of them are negative number that smaller than 0. If it just keeps > "ret < 0", it can never make a difference between bitflip/eccerror > and others. I forgot about these, your're right, so what about: static bool mtdpstore_is_io_error(int ret) { return ret < 0 && !mtd_is_eccerr(ret)> && !mtd_is_bitflip(ret); } for (...) { if (mtdpstore_is_io_error(ret)) return ret; /* * Comment explaining why we still return valid data * despite ECC errors. */ if (mtd_is_eccerr(ret)) just-complain(); } This snippet makes a difference between any "controller/bus timeout/error : data could not be retrieved" and "ECC error, maybe we can still read it and try to understand (risky, must be warned)". > > > if (mtd_is_eccerr()) > > complain; > > } > > > >> } else if (mtd_is_eccerr(ret)) { > >> dev_warn("eccerr at...."); > >> retlen = retlen == 0 ? size : retlen; > >> continue; > >> } else { > >> dev_err("read failure at..."); > >> /* this zone is broken, try next one */ > >> return -ENEXT; > >> } > >> } > >> > > > > > > Thanks, > > Miquèl > > Thanks, Miquèl