Re: [patch] fs: improved handling of page and buffer IO errors

steve@xxxxxxxxxxx · Tue, 21 Oct 2008 14:38:14 +0100

Hi,

On Tue, Oct 21, 2008 at 03:14:48PM +0200, Miklos Szeredi wrote:
> On Tue, 21 Oct 2008, steve@xxxxxxxxxxx
> > > Is there a case where retrying in case of !PageUptodate() makes any
> > > sense?
> > >
> > Yes... cluster filesystems. Its very important in case a readpage
> > races with a lock demotion. Since the introduction of page_mkwrite
> > that hasn't worked quite right, but by retrying when the page is
> > not uptodate, that should fix the problem,
> 
> I see.
> 
> Could you please give some more details?  In particular I don't know
> what's lock demotion in this context.  And how page_mkwrite() come
> into the picture?
> 
> Thanks,
> Miklos

page_mkwrite is only in the picture really because its the last
time that code was changed. At that point GFS2 adopted
->filemap_fault() rather than using its own page fault
routine.

So here are the basics of locking, so far as GFS2 goes, although
other cluster filesystems are similar. Lets suppose that on
node A, an inode is in cache and its being read/written and
on node B, another process wants to perform some operation
(read/write/etc) on the same inode.

Node B requests a lock via the dlm which causes Node A to
receive a callback. The callback sets a flag in the glock[*]
state on Node A corresponding to the inode in question. This
results in all future requests for that particular lock on
Node A blocking. Also at that time, we unmap any mapped pages
relating to that inode, flush any dirty data back onto the
disk (and if the request was for an exclusive lock, invalidate
the pages as well). So thats what I was refering to above as
lock demotion.

Once thats done, the dlm/glock is dropped (again notification is via
the dlm) and if Node A has outstanding requests queued up, it
re-requests the glock. This is a slightly simplified explanation
but, I hope it gives the general drift.

So to return to the original subject, in order to allow all
this locking to occur with no lock ordering problems, we have
to define a suitable ordering of page locks vs. glocks, and the
ordering that we use is that glocks must come before page locks. The
full ordering of locks in GFS2 is in Documentation/filesystems/gfs2-glocks.txt

As a result of that, the VFS needs reads (and page_mkwrite) to
retry when !PageUptodate() in case the returned page has been
invalidated at any time when the page lock has been dropped.

Obviously we hope that this doesn't happen too often since its
very inefficient (and we have a system to try and reduce the
frequency of such events) but it can and does happen at more
or less any time, so the vfs needs to take that into account.

I hope that makes some kind of sense... let me know if its
not clear,

Steve.

[*] The glock layer is a state machine which is associated with each
dlm lock and performs the required actions is response to dlm messages
and filesystem requests to keep the page cache coherent.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html