Hello, > >>>> +/* > >>>> + * All zones will be read as pstore/blk will read zone one by one when do > >>>> + * recover. > >>>> + */ > >>>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off) > >>>> +{ > >>>> + struct mtdpstore_context *cxt = &oops_cxt; > >>>> + size_t retlen; > >>>> + int ret; > >>>> + > >>>> + if (mtdpstore_block_isbad(cxt, off)) > >>>> + return -ENEXT; > >>>> + > >>>> + pr_debug("try to read off 0x%llx size %zu\n", off, size); > >>>> + ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf); > >>>> + if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen) { > >>> > >>> IIRC size != retlen does not mean it failed, but that you should > >>> continue reading after retlen bytes, no? > >>> >> > >> Yes, you are right. I will fix it. Thanks. > >> > >>> Also, mtd_is_bitflip() does not mean that you are reading a false > >>> buffer, but that the data has been corrected as it contained bitflips. > >>> mtd_is_eccerr() however, would be meaningful. > >>> >> > >> Sure I know mtd_is_bitflip() does not mean failure, but I do not think > >> mtd_is_eccerr() should be here since the codes are ret < 0 and NOT > >> mtd_is_bitflip(). > > > > Yes, just drop this check, only keep ret < 0. > > > > If I don't get it wrong, it should not be dropped here. Like your words, > "mtd_is_bitflip() does not mean that you are reading a false buffer, > but that the data has been corrected as it contained bitflips.", the > data I get are valid even if mtd_is_bitflip() return true. It's correct > data and it's no need to go to handle error. To me, the codes > should be: > if (ret < 0 && !mit_is_bitflip()) > [error handling] Please check the implementation of mtd_is_bitflip(). You'll probably figure out what I am saying. https://elixir.bootlin.com/linux/latest/source/include/linux/mtd/mtd.h#L585 |...] > >>>> + return; > >>>> + } > >>>> + if (unlikely(info->dmesg_size % mtd->writesize)) { > >>>> + pr_err("record size %lu KB must align to write size %d KB\n", > >>>> + info->dmesg_size / 1024, > >>>> + mtd->writesize / 1024); > >>> > >>> This condition is weird, why would you check this? > >>> >> > >> pstore/blk will write 'record_size' dmesg log at one time. > >> Since each write data must be aligned to 'writesize' for flash, I am not > >> sure > >> all flash drivers are compatible with misaligned data, that's why i > >> check this. > > > > I think you should enforce this alignment instead of checking it. > > > > Do you mean that mtdpstore should enforce this alignment while running? > The way I can think of is to handle a buffer aligned to writesize and > write to flash with this aligned buffer. > > That causes some error. The MTD device will be divided into mutil > chunks accroding to dmesg_size. Each chunk stores a individual > OOPS/Panic log. If dmesg_size is misaligned to writesize, the last > write results in next write failure because the page of flash can only > be programed once before next erase and the page shared by two chunks > has been used by the last write. Besides, we can not read to buffer, > ersae and write back as there is no read/erase for panic case. I mean: what is the usual size of dmesg? I don't get why you need it to be ie. a multiple of 2k. It probably is actually, I don't know if there is a standard. But if dmesg_size is eg 3k, just skip the end of the partially written page and start writing at the next page? > > >> > >>>> + return; > >>>> + } > >>>> + if (unlikely(mtd->size > MTDPSTORE_MAX_MTD_SIZE)) { > >>>> + pr_err("mtd%d is too large (limit is %d MiB)\n", > >>>> + mtd->index, > >>>> + MTDPSTORE_MAX_MTD_SIZE / 1024 / 1024); > >>> > >>> Same question? I could understand that it is easier to manage blocks > >>> knowing their maximum number though. > >>> >> > >> It refers to mtdoops. > > > > What do you mean? > > > > To me, it's unnecessary to check at all, however it is really there > on codes of mtdoops. I refer to module mtdoops when I design mtdpstore. > It may be helpfull for some cases out of my think, that's why I keep it. Why not. [...] > >> > >> In case of repeated erase when users remove several log files, mtdpstore > >> do remove jobs when exit. > >> > >> Besides, mtdpstore do not check the return code to ensure write back valid > >> log as much as possible. > > > > You are not in a critical path, I don't understand why you don't check > > it? If it returns an error, it means the data is not written. IMHO it > > is best to alert the user than to silently fail. > > > > This function will be called only when mtd device is removing. It's > useless to alert the user but try to flush the other valid data to It is useful to alert the user! It means something went wrong while everything seems fine. > flash as mush as possible by which reduce losses. If it's just > because of busy, what happens next time? Just because of busy? I don't get it. I'm okay with the idea of trying to write the other chunks though: while (remaining_chunk) { ret = mtd_write() if (ret) { alert-user; continue; } } > > >> > >>>> +. >>>> + off += zonesize; > >>>> + size -= min_t(unsigned int, zonesize, size); > >>>> + } > >>>> + > >>>> +free: > >>>> + kfree(buf); > >>>> + return ret; > >>>> +} > >>>> + > > > > > > [...] > > > >>> > >>> Thanks, > >>> Miquèl > >>> >> > >> I will collect more suggestions and submit the new version at one time. > >> > > > > Sure, no hurry. > > > > I am on holiday, please forgive me for my slow response. Take your time, as I said, no hurry. > > > > > Thanks, > > Miquèl > > Thanks, Miquèl