Re: Corruption on shutdown outside the current partition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2011-05-30, Andreas Dilger <adilger@xxxxxxxxx> wrote:

>> If I use a hardware reset method instead of the kernel syscall, by
>> triggering a watchdog with interrupts locked, or doing a power cycle
>> with a testing machine, the problem does not happen. This led me to
>> think it could be a software failure, rather than the hardware failure I
>> was expecting. After activating the traces in the mmc subsystem, I
>> finally managed to catch write commands to an area outside the partition
>> being tested, which means that the problem is really due to software.
>
> Why don't you dump a stack at that point to see what is causing the
> write? Also, blktrace might be helpful to determine what caused the block
> to be written.

I tried that, unfortunately the asynchronous I/O framework led me to
have the stack of the mmc worker thread, instead of the stack of the
request originator. But it was a good first step, since it gave me an
error marker, and made me notice that the problem is much more common
than I thought. It was only hidden due to the fact that the writes
fell in unused areas of my boot partition.

Since blktrace lives in userspace, it is liable to be destroyed during
the reboot process, and give me only partial information. But I finally
found what I wanted: by writing 1 to /proc/sys/vm/block_dump, I am able
to see the original requests that led to the commands in the system log.

>From what I see now, it seems that the problem comes from a race
condition on shutdown between pending file system operations on one
side, and partition removal on the other side. It seems that the
partition can be removed, and yet some pending requests are still valid,
and are handled with the partition offset equal to 0. This leads to
the corruptions I am observing.

I have yet to figure the events leading to this, and find a correction,
since all this is happening in a part I'm not familiar of.

> Another possibility (I'm not very familiar with MMC hardware, so could
> be bogus) is that the partitions don't align to the hardware/erase
> block size of the underlying device, and a "legitimate" write to one
> partition is causing a read-modify-write into a region of another
> partition, but this isn't being handled correctly?
>

I also had alignment problems, but it only impacted performance, not
correctness.


Thanks for your help,
-- 
Romain Izard

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux