Re: [PATCH md 0 of 4] Introduction

ptb@xxxxxxxxxxxxxx (Peter T. Breuer) · Wed, 9 Mar 2005 16:37:04 +0100

Neil Brown <neilb@xxxxxxxxxxxxxxx> wrote:
> On Tuesday March 8, ptb@xxxxxxxxxxxxxx wrote:
> > Have you remodelled the md/raid1 make_request() fn?
> 
> Somewhat.  Write requests are queued, and raid1d submits them when
> it is happy that all bitmap updates have been done.

OK - so a slight modification of the kernel generic_make_request (I
haven't looked).  Mind you, I think that Paul said that just before
clearing bitmap entries, incoming requests were checked to see if a
bitmap entry should be marked again..

Perhaps both things happen. Bitmap pages in memory are updated as
clean after pending writes have finished and then marked as dirty as
necessary, and then flushed and when the flush finishes new accumulated
requests are started.

One can

> There is no '1/100th' second or anything like that.

I was trying in a way to give a definite image to what happens, rather
than speak abstractly. I'm sure that the ordinary kernel mechanism for
plugging and unplugging is used, as much as it is possible. If yu
unplug when the request struct reservoir is exhausted, then it will be
at 1K requests. If they are each 4KB, that will be every 4MB. At say
64MB/s, that will be every 1/16 s. And unplugging may happen more
frequently because of other kernel magic mumble mumble ...

> When a write request arrives, the queue is 'plugged', requests are
> queued, and bits in the in-memory bitmap are set.

OK.

> When the queue is unplugged (by the filesystem or timeout) the bitmap
> changes (if any) are flushed to disk, then the queued requests are
> submitted. 

That accumulates bitmap markings into the minimum number of extra
transactions.  It does impose extra latency, however.

I'm intrigued by exactly how you exert the memory pressure required to
force just the dirty bitmap pages out. I'll have to look it up.

> Bits on disk are cleaned lazily.

OK - so the disk bitmap state is always pessimistic. That's fine. Very
good.

> Note that for many applications, the bitmap does not need to be huge.
> 4K is enough for 1 bit per 2-3 megabytes on many large drives.
> Having to sync 3 meg when just one block might be out-of-sync may seem
> like a waste, but it is heaps better than syncing 100Gig!!

Yes - I used 1 bit per 1K, falling back to 1 bit per 2MB under memory
pressure.

> > And if so, do you also aggregate them? And what steps are taken to
> > preserve write ordering constraints (do some overlying file systems
> > still require these)?
> 
> filesystems have never had any write ordering constraints, except that
> IO must not be processed before it is requested, nor after it has been
> acknowledged.  md continue to obey these restraints.

Out of curiousity, is aggregation done on the queued requests? Or are
they all kept at 4KB? (or whatever - 1KB).

Thanks!

Peter

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html