puzzling error message

sct@redhat.com (Stephen C. Tweedie) · Wed, 9 Jan 2002 13:06:17 +0000

Hi,

On Tue, Jan 08, 2002 at 03:27:19PM -0800, Andrew Morton wrote:

> > What is weird is that in this recent case, and the one before, the
> > corruption has been in partitions that I would have thought wouldn't
> > be written too, e.g. I could understand if, while writing to /home
> > I got trashed data, but I didn't expect /usr to be corrupted simply
> > by reading it?  Unless the head in the HDD had crashed but I would
> > have expected the Win2000 partion to be affected as well and it
> > doesn't seem upset at the moment.
> 
> The journal contains, basically, a list of block numbers (4-byte
> integers) and then the block contents themselves.  On recovery,
> we read the block numbers, save them in memory, then read that
> many blocks and write them out into the fs.
> 
> So if the "list of block numbers" gets corrupted, the recovery
> data will be sprayed all over the disk :(

No, we should be safe from that.  The recovery is done on the
partition's own minor device, not on the whole-disk device, and the
block layer does bounds checking on that.

For recovery, we use ll_rw_block, which does the checks directly.  For
normal journal IO we use submit_bh(), which does the bounds checking
inside generic_make_request.  In both cases, the block layer prevents
IO to our partition from leaking out onto a different partition.  If
such leaks are occurring, they have to be due to a fault below the
level of those bounds checks.

Cheers,
 Stephen