>On Thu, Jan 03, 2014 at 19:24, Eric Sandeen wrote >> >> On 1/3/14, 10:29 AM, Juergens Dirk (CM-AI/ECO2) wrote: >> > So, I think there _might_ be a kernel bug, but it could be also a >> problem >> > related to the particular type of eMMC. We did not observe the same >> issue >> > in previous tests with another type of eMMC from another supplier, >> but this >> > was with an older kernel patch level and with another HW design. >> > >> > Regarding a possible kernel bug: Is there any chance that the invalid >> > ee_len or ee_start are returned by, e.g., the block allocator ? >> > If so, can we try to instrument the code to get suitable traces ? >> > Just to see or to exclude that the corrupted inode is really written >> > to the eMMC ? >> >> From your description it does sound possible that it's a kernel bug. >> Adding testcases to the code to catch it before it hits the journal >> might be helpful - but then maybe this is something getting overwritten >> after the fact - hard to say. >> >> Can you share more details of the test you are running? Or maybe even >> the test itself? >Yes, for sure, we can. Weller, please provide additional details >or corrections. >In short: >Basically we use an automated cyclic test writing many small > (some kBytes) files with CRC checksums for easy consistency check >into a separate test partition. Files also contain meta information >like filename, sequence number and a random number to allow to identify >from block device image dumps, if we just see a fragment of an old >deleted file or a still valid one. >Each test loop looks like this: >1) Boot the device after power on or reset >2) Do fsck -n BEFORE mounting >2 a) (optional) binary dump of the journal >3) Mount test partition >4) File content check for all files from prev. loop >5) erase all files from previous loop >6) start writing hundreds/thousands of test files > in multiple directories with several threads >7) after random time cut the power or do soft reset >If 2), 3), 4) or 5) fails, stop test. >We are running the test usually with kind of transaction >safe handling, i.e. use fsync/rename, to avoid zero length files >or file fragments. Yes, Dirk's description is right. And You also can get the detail of my test in the package code_out.tar.gz in another mail. There is a document to introduce my test tool and test case. And also the test scripts. Thanks. Huang weller -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html