On 1/3/14, 12:45 PM, Juergens Dirk (CM-AI/ECO2) wrote: > > On Thu, Jan 03, 2014 at 19:24, Eric Sandeen wrote >> >> On 1/3/14, 10:29 AM, Juergens Dirk (CM-AI/ECO2) wrote: >>> So, I think there _might_ be a kernel bug, but it could be also a >> problem >>> related to the particular type of eMMC. We did not observe the same >> issue >>> in previous tests with another type of eMMC from another supplier, >> but this >>> was with an older kernel patch level and with another HW design. >>> >>> Regarding a possible kernel bug: Is there any chance that the invalid >>> ee_len or ee_start are returned by, e.g., the block allocator ? >>> If so, can we try to instrument the code to get suitable traces ? >>> Just to see or to exclude that the corrupted inode is really written >>> to the eMMC ? >> >> From your description it does sound possible that it's a kernel bug. >> Adding testcases to the code to catch it before it hits the journal >> might be helpful - but then maybe this is something getting overwritten >> after the fact - hard to say. >> >> Can you share more details of the test you are running? Or maybe even >> the test itself? > > Yes, for sure, we can. Weller, please provide additional details > or corrections. > > In short: > Basically we use an automated cyclic test writing many small > (some kBytes) files with CRC checksums for easy consistency check > into a separate test partition. Files also contain meta information > like filename, sequence number and a random number to allow to identify > from block device image dumps, if we just see a fragment of an old > deleted file or a still valid one. > > Each test loop looks like this: 0) mkfs the filesystem - with what options? How big? > 1) Boot the device after power on or reset > 2) Do fsck -n BEFORE mounting > 2 a) (optional) binary dump of the journal > 3) Mount test partition Again with what options, if any? > 4) File content check for all files from prev. loop > 5) erase all files from previous loop > 6) start writing hundreds/thousands of test files > in multiple directories with several threads I guess this is where we might need more details in order, to try to recreate the failure, but perhaps this is not a case where you can simply share the IO generation utility...? Thanks, -Eric > 7) after random time cut the power or do soft reset > > If 2), 3), 4) or 5) fails, stop test. > > We are running the test usually with kind of transaction > safe handling, i.e. use fsync/rename, to avoid zero length files > or file fragments. > >> >> I've used a test framework in the past to simulate resets w/o needing >> to reset the box, and do many journal replays very quickly. It'd be >> interesting to run it using your testcase. >> >> Thanks, >> -Eric > > Mit freundlichen Grüßen / Best regards > > Dirk Juergens > > Robert Bosch Car Multimedia GmbH > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html