Re: Questions answered by Neil Brown

"Peter T. Breuer" <ptb@it.uc3m.es> · Wed, 26 Feb 2003 09:09:28 +0100 (MET)

"A month of sundays ago Paul Clements wrote:"
> "Peter T. Breuer" wrote:
> > "A month of sundays ago Neil Brown wrote:"
> > > On Monday February 24, Paul.Clements@SteelEye.com wrote:
> 
> > So it might be enough to chain all the mirror bh's through
> > bh->b_this_page.
> 

> What if the user is waiting on a page and not a buffer (not sure if that
> can/will happen). In that case, we'd be artificially causing him to wait

I'm not the person to answer that kind of question. If I understood the
VMS I wouldn't be asking questions myself!

> when it wasn't necessary. Suppose all the I/O for a page really was
> complete, but we kept the user waiting until all the mirror I/Os
> (including ones to backup devices) for that page had completed. 

You're one up in the abstraction hierarchy from me .. I just am seeing
the concrete b_end_io codes and what they might do. I can offer no
sensible comment.

> Another thing I'm not sure about is whether it's safe for raid1 to
> modify the b_this_page field (for a buffer that was passed in from
> above)...we'd at least have to insert our values into the existing list.
> Is it safe to modify the list without any locks held?

I'm sure it is. We modify it anyway, to write the 1s. And we do it
before submitting any of the bh's involved.

> > I believe that currently this field is just set to "1" in
> > raid1_make_request().
> 
> Yeah, I sure wish I knew who did that and why. I wonder if someone had a

A check of the buffer.c code shows b_this_page is always tested against
real bh addresses in any conditionals in which it appears.  It appears
to potentially be a circular list and they're testing for having got
back to the beginning (or having a trivial circle). What I want to know
is how "1" ever worked. It can't! Not in the buffer.c code, enayway.
That's end_buffer_io_async and discard_bh_page and
__block_write_full_page and __block_prepare_write and
__block_commit_write and  block_read_full_page and ... well, lots.
All of these go chasing through that field trusting it to be the
address of a bh.

> clever plan to use that field at some point, but never got around to it.
> Setting that field to something besides a real address sure does seem
> odd...and I can't see that it's ever used anywhere.

What is also puzzling me is that despite the horrible potential for
what might happen from doing the original users end_io early, I
can't see any consequences in actual tests!

I am writing stuff to a file on a raid1 mirror mount, then dismantling
the mount, the device, and rmmoding the driver. Then I put it all back
again. On remount the data in the file is all perfect. Yet it was built
with plenty of async writes! Surely the buffers in some of those writes
should have been pointing nowhere and been full of rubbish?

   ...
   raid1: async end i/o on sectors 494-495
   raid1: async end i/o on sectors 496-497
   raid1: async end i/o on sectors 498-499
   raid1: async end i/o on sectors 500-501
   raid1: async end i/o on sectors 502-503
   raid1: async end i/o on sectors 504-505
   raid1: async end i/o on sectors 506-507
   raid1: async end i/o on sectors 508-509
   raid1: async end i/o on sectors 510-511
   raid1: async end i/o on sectors 512-513
   ...

Got an idea for a test that might show up corruption accruing from the
suspect mechanism?

Peter
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html