Hi Boris & Miquel, > -----Original Message----- > From: Miquel Raynal [mailto:miquel.raynal@xxxxxxxxxxx] > Sent: Tuesday, November 20, 2018 6:06 PM > To: Boris Brezillon <boris.brezillon@xxxxxxxxxxx> > Cc: Naga Sureshkumar Relli <nagasure@xxxxxxxxxx>; richard@xxxxxx; > dwmw2@xxxxxxxxxxxxx; computersforpeace@xxxxxxxxx; marek.vasut@xxxxxxxxx; linux- > mtd@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; nagasuresh12@xxxxxxxxx; > robh@xxxxxxxxxx; Michal Simek <michals@xxxxxxxxxx> > Subject: Re: [LINUX PATCH v12 3/3] mtd: rawnand: arasan: Add support for Arasan > NAND Flash Controller > > Hi Naga, > > Boris Brezillon <boris.brezillon@xxxxxxxxxxx> wrote on Tue, 20 Nov 2018 > 12:02:44 +0100: > > > On Tue, 20 Nov 2018 07:02:08 +0000 > > Naga Sureshkumar Relli <nagasure@xxxxxxxxxx> wrote: > > > > > > > > > > > > Can you please run nandbiterrs (availaible in mtd-utils). I fear your > > > > device won't pass the test. > > > Yes, nandbiterror test is passing till 24bit, after that it is failing. > > > > Can you paste the output of nandbiterrs please? > > Apparently 'nandbiterrs -i 'just crashes the kernel because of a segmentation fault. Please run > this test (from the mtd-utils package) and fix this issue. Then we would like to see the output. Here is the output of mtd_nandbiterrs, [ 1830.546807] mtd_nandbiterrs: verify_page [ 1830.551924] mtd_nandbiterrs: Successfully corrected 8 bit errors per subpage [ 1830.558961] mtd_nandbiterrs: Inserted biterror @ 2/5 [ 1830.563917] mtd_nandbiterrs: rewrite page [ 1830.568216] mtd_nandbiterrs: read_page [ 1830.572155] mtd_nandbiterrs: verify_page [ 1830.576531] mtd_nandbiterrs: Successfully corrected 9 bit errors per subpage [ 1830.583568] mtd_nandbiterrs: Inserted biterror @ 2/2 [ 1830.588527] mtd_nandbiterrs: rewrite page [ 1830.592881] mtd_nandbiterrs: read_page [ 1830.596825] mtd_nandbiterrs: verify_page [ 1830.601197] mtd_nandbiterrs: Successfully corrected 10 bit errors per subpage [ 1830.608326] mtd_nandbiterrs: Inserted biterror @ 2/0 [ 1830.613279] mtd_nandbiterrs: rewrite page [ 1830.617585] mtd_nandbiterrs: read_page [ 1830.621524] mtd_nandbiterrs: verify_page [ 1830.625900] mtd_nandbiterrs: Successfully corrected 11 bit errors per subpage [ 1830.633027] mtd_nandbiterrs: Inserted biterror @ 3/7 [ 1830.637984] mtd_nandbiterrs: rewrite page [ 1830.642281] mtd_nandbiterrs: read_page [ 1830.646223] mtd_nandbiterrs: verify_page [ 1830.650595] mtd_nandbiterrs: Successfully corrected 12 bit errors per subpage [ 1830.657724] mtd_nandbiterrs: Inserted biterror @ 3/6 [ 1830.662677] mtd_nandbiterrs: rewrite page [ 1830.666983] mtd_nandbiterrs: read_page [ 1830.670922] mtd_nandbiterrs: verify_page [ 1830.675296] mtd_nandbiterrs: Successfully corrected 13 bit errors per subpage [ 1830.682417] mtd_nandbiterrs: Inserted biterror @ 3/5 [ 1830.687373] mtd_nandbiterrs: rewrite page [ 1830.691671] mtd_nandbiterrs: read_page [ 1830.695610] mtd_nandbiterrs: verify_page [ 1830.699983] mtd_nandbiterrs: Successfully corrected 14 bit errors per subpage [ 1830.707113] mtd_nandbiterrs: Inserted biterror @ 3/2 [ 1830.712067] mtd_nandbiterrs: rewrite page [ 1830.716494] mtd_nandbiterrs: read_page [ 1830.720459] mtd_nandbiterrs: verify_page [ 1830.724841] mtd_nandbiterrs: Successfully corrected 15 bit errors per subpage [ 1830.731963] mtd_nandbiterrs: Inserted biterror @ 3/0 [ 1830.736920] mtd_nandbiterrs: rewrite page [ 1830.741161] mtd_nandbiterrs: read_page [ 1830.745107] mtd_nandbiterrs: verify_page [ 1830.749478] mtd_nandbiterrs: Successfully corrected 16 bit errors per subpage [ 1830.756607] mtd_nandbiterrs: Inserted biterror @ 4/2 [ 1830.761564] mtd_nandbiterrs: rewrite page [ 1830.765924] mtd_nandbiterrs: read_page [ 1830.769858] mtd_nandbiterrs: verify_page [ 1830.774232] mtd_nandbiterrs: Successfully corrected 17 bit errors per subpage [ 1830.781362] mtd_nandbiterrs: Inserted biterror @ 4/0 [ 1830.786318] mtd_nandbiterrs: rewrite page [ 1830.790558] mtd_nandbiterrs: read_page [ 1830.794496] mtd_nandbiterrs: verify_page [ 1830.798867] mtd_nandbiterrs: Successfully corrected 18 bit errors per subpage [ 1830.805997] mtd_nandbiterrs: Inserted biterror @ 5/7 [ 1830.810949] mtd_nandbiterrs: rewrite page [ 1830.815249] mtd_nandbiterrs: read_page [ 1830.819189] mtd_nandbiterrs: verify_page [ 1830.823561] mtd_nandbiterrs: Successfully corrected 19 bit errors per subpage [ 1830.830690] mtd_nandbiterrs: Inserted biterror @ 5/2 [ 1830.835646] mtd_nandbiterrs: rewrite page [ 1830.839943] mtd_nandbiterrs: read_page [ 1830.843886] mtd_nandbiterrs: verify_page [ 1830.848252] mtd_nandbiterrs: Successfully corrected 20 bit errors per subpage [ 1830.855378] mtd_nandbiterrs: Inserted biterror @ 5/0 [ 1830.860331] mtd_nandbiterrs: rewrite page [ 1830.864580] mtd_nandbiterrs: read_page [ 1830.868522] mtd_nandbiterrs: verify_page [ 1830.872890] mtd_nandbiterrs: Successfully corrected 21 bit errors per subpage [ 1830.880023] mtd_nandbiterrs: Inserted biterror @ 6/6 [ 1830.884975] mtd_nandbiterrs: rewrite page [ 1830.889224] mtd_nandbiterrs: read_page [ 1830.893158] mtd_nandbiterrs: verify_page [ 1830.897536] mtd_nandbiterrs: Successfully corrected 22 bit errors per subpage [ 1830.904663] mtd_nandbiterrs: Inserted biterror @ 6/2 [ 1830.909619] mtd_nandbiterrs: rewrite page [ 1830.913950] mtd_nandbiterrs: read_page [ 1830.917893] mtd_nandbiterrs: verify_page [ 1830.922261] mtd_nandbiterrs: Successfully corrected 23 bit errors per subpage [ 1830.929384] mtd_nandbiterrs: Inserted biterror @ 6/0 [ 1830.934340] mtd_nandbiterrs: rewrite page [ 1830.938579] mtd_nandbiterrs: read_page [ 1830.942519] mtd_nandbiterrs: verify_page [ 1830.946884] mtd_nandbiterrs: Successfully corrected 24 bit errors per subpage [ 1830.954010] mtd_nandbiterrs: Inserted biterror @ 7/7 [ 1830.958963] mtd_nandbiterrs: rewrite page [ 1830.963264] mtd_nandbiterrs: read_page [ 1830.967143] mtd_nandbiterrs: verify_page [ 1830.971061] mtd_nandbiterrs: Error: page offset 0, expected 25, got 00 [ 1830.977584] mtd_nandbiterrs: Error: page offset 1, expected a5, got 00 [ 1830.984103] mtd_nandbiterrs: Error: page offset 2, expected 65, got 00 [ 1830.990621] mtd_nandbiterrs: Error: page offset 3, expected e5, got 00 [ 1830.997141] mtd_nandbiterrs: Error: page offset 4, expected 05, got 00 [ 1831.003659] mtd_nandbiterrs: Error: page offset 5, expected 85, got 00 [ 1831.010179] mtd_nandbiterrs: Error: page offset 6, expected 45, got 00 [ 1831.016695] mtd_nandbiterrs: Error: page offset 7, expected c5, got 45 [ 1831.023665] mtd_nandbiterrs: ECC failure, read data is incorrect despite read success modprobe: can't load module mtd_nandbiterrs (kernel/drivers/mtd/tests/mtd_nandbiterrs.ko): Input/output error ---> Test fail, unable to start nand_mtd_nandbiterrs client on the target I ran this on v12 series, but it didn't work straight away. I changed the code to make it work for this test. I found one problem that, the driver will work always if the page programming sequence 0x80 followed by 0x10. i.e. [1]:nand_prog_page_op(chip, page, 0, buf, mtd->writesize)-> this op sequence is working with this driver. [2]: nand_prog_page_begin_op(chip, page, 0, NULL, 0) -> this op sequence is not working with this driver. The Arasan NAND controller is expecting 0x80 as first opcode and 0x10 as second opcode in the command register (off: 0xFF10000C). Hence in v11 series, I have added a check such that if the command is 0x080, then hardcode the second command as 0x10. But as per the Boris comments, I removed that hardcoding and it is working only if the write sequence is [1] as mentioned above. > > > > > > > > > > > > But we are hitting this because of erased page reading(needed in case of ubifs). > > > > > > > > > > > > > > > > > Don't you have a bit (or several bits) reporting when the ECC engine was not > able to > > > > correct > > > > > > data? I you do, you should base the "detect bitflips in erase pages" logic on this > information. > > > > > Bit reporting for several bit errors is there only for Hamming(1bit correction and > 2bit > > > > detection) but not in BCH. > > > > > > > > > > > > > Then I tend to agree with Miquel: your ECC engine is broken, and I'm > > > > not even sure how to deal with that yet. > > > So as per the Miquel's suggestion, can I proceed to add the below one? > > > "you should re-read the page in raw mode and check for the number of bitflips manually > (thanks to the helpers in the core). Again, if the number of BF is above 16, we can assume the > page is bad and increment ->ecc.failed accordingly." > > > > But that's just partially fixing the problem. And you didn't answer my > > previous question: what happens when you configure the ECC engine in, > > say 12bit/1024 and you end up with uncorrectable errors (more than 12 > > bitflips in a 1k block). What's the number reported ECC_ERR_CNT? Is it > > set to 13? > > Please dump this register, and eventually what's the value of the Packet_bound_Err_count > field ([0:7]) for each iteration of nandbiterrs -i. > If there is no way, when the status bit is set, to discriminate if the data is reliable or was not > corrected at all, it is gonna be a real issue and I don't think we want to support such engine. On each iteration the error count value that I got during this test, is equal to the number of error bits introduced i.e. for 1-bit error, the error count is 1 ....... 24-bit errors, the error count is 24 But after that the error count is 0. Thanks, Naga Sureshkumar Relli > > > Thanks, > Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/