On Fri, Jul 04, 2014 at 02:45:39PM -0400, Theodore Ts'o wrote: > On Fri, Jul 04, 2014 at 03:45:59PM +0200, David Jander wrote: > > > 1) Some kind of eMMC driver bug, which is possibly causing the CACHE > > > FLUSH command not to be sent. > > > > How can I investigate this? According to the fio tests I ran and the > > explanation Dmitry gave, I conclude that incorrectly sending of CACHE-FLUSH > > commands is the only thing left to be discarded on the eMMC driver front, > > right? > > Can you try using an older kernel? The report that that I quoted from > John Stultz (https://lkml.org/lkml/2014/6/12/19) indicated that it was > a problem that showed up in "recent kernels", and a bisection search > seemed to point towards an unknown problem in the eMMC driver. > Quoting from https://lkml.org/lkml/2014/6/12/762: > > "However, despite many many reboots the last good commit in my > branch - bb5cba40dc7f079ea7ee3ae760b7c388b6eb5fc3 (mmc: block: > Fixup busy detection while...) doesn't ever show the issue. While > the immediately following commit which bisect found - > e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 (mmc: mmci: Handle CMD > irq before DATA irq) always does. > > The immensely frustrating part is while backing that single change off > from its commit sha always makes the issue go away, reverting that > change from on top of v3.15 doesn't. The issue persists....." > > > > 2) Some kind of hardware problem involving flash translation layers > > > not having durable transactions of their flash metadata across power > > > failures. > > > > That would be like blaming Micron (the eMMC part manufacturer) for faulty > > firmware... could be, but how can we test this? > > The problem is that people who write these programs end up doing > one-offs, as opposed to something that is well packaged and stands the > test of time. But basically what we want is a program that writes to > sequential blocks in a block device with the following information: > > *) a timestamp (seconds and microseconds from gettimeofday) > *) a 64-bit generation number (which is randomly > generated and the same for each run of the progam) > *) a 32-bit sequence number (starts at zero and > increments once per block > *) a 32-bit "sync" number which is written after each time > fsync(2) is called while writing to the disk > *) the sector number where the data was written > *) a CRC of the above information > *) some random pattern to fill the rest of the 512 or 4k block, > depending on the physical sector size genstream + checkstream. http://oss.sgi.com/projects/nfs/testtools/ Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html