On Thu, Jan 03, 2014 at 17:30, Theodore Ts'o [mailto:tytso@xxxxxxx] wrote: > > On Fri, Jan 03, 2014 at 11:16:02AM +0800, Huang Weller (CM/ESW12-CN) > wrote: > > > > It sounds like the barrier test. We wrote such kind test tool > > before, the test program used ioctl(fd, BLKFLSBUF, 0) to set a > > barrier before next write operation. Do you think this ioctl is > > enough ? Because I saw the ext4 use it. I will do the test with that > > tool and then let you know the result. > > The BLKFLSBUF ioctl does __not__ send a CACHE FLUSH command to the > hardware device. It forces all of the dirty buffers in memory to the > storage device, and then it invalidates all the buffer cache, but it > does not send a CACHE FLUSH command to the hardware. Hence, the > hardware is free to write it to its on-disk cache, and not necessarily > guarantee that the data is written to stable store. (For an example > use case of BLKFLSBUF, we use it in e2fsck to drop the buffer cache > for benchmarking purposes.) > > If you want to force a CACHE FLUSH (or barrier, depending on the > underlying transport different names may be given to this operation), > you need to call fsync() on the file descriptor open to the block > device. > > > More information about journal block which caused the bad extents > > error: We enabled the mount option journal_checksum in our test. We > > reproduced the same problem and the journal checksum is correct > > because the journal block will not be replayed if checksum is error. > > How did you enable the journal_checksum option? Note that this is not > safe in general, which is why we don't enable it or the async_commit > mount option by default. The problem is that currently the journal > replay stops when it hits a bad checksum, and this can leave the file > system in a worse case than it currently is in. There is a way we > could fix it, by adding per-block checksums to the journal, so we can > skip just the bad block, and then force an efsck afterwards, but that > isn't something we've implemented yet. > > That being said, if the journal checksum was valid, and so the > corrupted block was replayed, it does seem to argue against > hardware-induced corruption. Yes, this was also our feeling. Please see my other mail just sent some minutes ago. We know about the possible problems with journal_checksum, but we thought that it is a good option in our case to identify if this is a HW- or SW-induced issue. > > Hmm.... I'm stumped, for the moment. The journal layer is quite > stable, and we haven't had any problems like this reported in many, > many years. > > Let's take this back to first principles. How reliably can you > reproduce the problem? How often does it fail? With kernel 3.5.7.23 about once per overnight long term test. > Is it something where > you can characterize the workload leading to this failure? Secondly, > is a power drop involved in the reproduction at all, or is this > something that can be reproduced by running some kind of workload, and > then doing a soft reset (i.e., force a kernel reboot, but _not_ do it > via a power drop)? As I stated in my other mail, it is also reproduced with soft resets. Weller can give more details about the test setup. > > The other thing to ask is when did this problem first start appearing? > With a kernel upgrade? A compiler/toolchain upgrade? Or has it > always been there? > > Regards, > > - Ted Mit freundlichen Grüßen / Best regards Dr. rer. nat. Dirk Juergens Robert Bosch Car Multimedia GmbH -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html