Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi Miquel Raynal,

On 2020/2/6 PM 11:45, Miquel Raynal wrote:
Hi liao,

liaoweixiong <liaoweixiong@xxxxxxxxxxxxxxxxx> wrote on Thu, 6 Feb 2020
21:10:47 +0800:

hi Miquel Raynal,

On 2020/1/23 AM 1:41, Miquel Raynal wrote:
Hello,

+/*
+ * All zones will be read as pstore/blk will read zone one by one when do
+ * recover.
+ */
+static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+	size_t retlen;
+	int ret;
+
+	if (mtdpstore_block_isbad(cxt, off))
+		return -ENEXT;
+
+	pr_debug("try to read off 0x%llx size %zu\n", off, size);
+	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
+	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {

IIRC size != retlen does not mean it failed, but that you should
continue reading after retlen bytes, no?
     >>
Yes, you are right. I will fix it. Thanks.
   >>>>> Also, mtd_is_bitflip() does not mean that you are reading a false
buffer, but that the data has been corrected as it contained bitflips.
mtd_is_eccerr() however, would be meaningful.
     >>
Sure I know mtd_is_bitflip() does not mean failure, but I do not think
mtd_is_eccerr() should be here since the codes are ret < 0 and NOT
mtd_is_bitflip().

Yes, just drop this check, only keep ret < 0.
    >>
If I don't get it wrong, it should not	 be dropped here. Like your words,
"mtd_is_bitflip() does not mean that you are reading a false buffer,
but that the data has been corrected as it contained bitflips.", the
data I get are valid even if mtd_is_bitflip() return true. It's correct
data and it's no need to go to handle error. To me, the codes
should be:
	if (ret < 0 && !mit_is_bitflip())
		[error handling]

Please check the implementation of mtd_is_bitflip(). You'll probably
figure out what I am saying.

https://elixir.bootlin.com/linux/latest/source/include/linux/mtd/mtd.h#L585

How about the codes as follows:

for (done = 0, retlen = 0; done < size; done += retlen) {
	ret = mtd_read(..., &retlen, ...);
	if (!ret)
		continue;
	/*
	 * do nothing if bitflip and ecc error occurs because whether
	 * it's bitflip or ECC error, just a small number of bits flip
	 * and the impact on log data is so small. The mtdpstore just
	 * hands over what it gets and user can judge whether the data
	 * is valid or not.
	 */
	if (mtd_is_bitflip(ret)) {
		dev_warn("bitflip at....");
		continue;

I don't understand why do you check for bitflips. Bitflips have been
corrected at this stage, you just get the information that there
has been bitflips, but the data integrity is fine.


Both of bitflip and eccerror are not real wrong in this
case. So we must check them.

I am not against ignoring ECC errors in this case though. I would
propose:

	for (...) {
		if (ret < 0) {
			complain;
			return;
		}


-117 (-EUCLEAN) means bitflip but be corrected.
-74 (-EBADMSG) means ecc error that uncorrectable
All of them are negative number that smaller than 0. If it just keeps
"ret < 0", it can never make a difference between bitflip/eccerror
and others.

		if (mtd_is_eccerr())
			complain;
	}
		
	} else if (mtd_is_eccerr(ret)) {
		dev_warn("eccerr at....");
		retlen = retlen == 0 ? size : retlen;
		continue;
	} else {
		dev_err("read failure at...");
		/* this zone is broken, try next one */
		return -ENEXT;
	}
}



Thanks,
Miquèl




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux