Re: Questions about mds locks

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 29 Aug 2013 10:17:49 -0700

On Wed, Aug 28, 2013 at 4:41 PM, 袁冬 <yuandong1222@xxxxxxxxx> wrote:
> Hello, everyone.
>
> I have some questions about mds locks. I search google and read almost
> all Sage's papers, but I found no details about mds locks.  :(

Unfortunately these encompass some of the most complicated and least
documented code in the project. :( But let's see how far I can take
you. :)

> 1, There are three classes about locks in mds: SimpleLock, ScatterLock
> and LocalLock which are used for different lock items such as
> CDentry.lock, CInode.authlock. What is the difference among the three
> classes? or which situation they are used for?

The purpose of the locks is of course to protect the state of the
metadata, and we have different locks covering different portions of
the Inode, Dentry, etc. We have different types of locks because we
need different behavior for different kinds of data in different
situations. SimpleLock is the base class (both implementation and
typing) and specifies most of the lock behavior necessary for handling
distributed locks; LocalLock is used for data that doesn't need
distributed locking across the MDS cluster (you'll notice the
LocalLocks are all versionlocks; IIRC this is because versions only be
updated by the MDS which is master for the data in question); and
ScatterLock handles locking for more complicated situations than
SimpleLock. If memory serves (Sage can correct me) the ScatterLock is
used in situations where we can delegate some authority to MDS
replicas of the authoritative data (eg, replica MDSes can generate
read capabilities for clients, and that requires updating the state
protected by filelock).
In particular you'll want to go through the scatter-gather mechanisms;
that's the big difference between SimpleLock and ScatterLock.

> 2, There are 13 kinds of locks defined in ceph_fs.h:
> CEPH_LOCK_DVERSION to CEPH_LOCK_IPOLICY, according to them there are
> 13 kinds of lock items,: two in CDentry and 11 in CInode. I think they
> are used to lock different zone of their parent (CDentry or CInode).
> Is that right? And which zone they locks?

Right, each of these locks different state in the metadata object.
Unfortunately I can't give you an enumeration of what exactly they
cover, but it should be pretty apparent for any given piece of data if
you look at the locks.

> 2, Each lock item have 38 states which is defined in locks.h and
> organized by 4 state machines. Is there any documents described these
> states and state machines? Many states look the same, such as
> LOCK_LOCK and LOCK_EXCL, What is their difference? Or under what
> condition, the state changes?

There's not any very useful documentation on this. You'll want to look
at the states more carefully as their meaning depends on the exact
lock type they are; but LOCK_LOCK and LOCK_EXCL don't look the same to
me?
In general each grouping of the locks is semantically meaningful and
you can expect "automatic" transitions between the grouped states,
while transition from one group of states to another is going to be
prompted by some request from a client or a big change the MDS is
making. eg, the "stable" value of each lock is the state that lock
will go to as soon as some action completes and it gets poked. And
each lock state specifies different things that the authoritative MDS
and the replica MDSes are allowed to do to that lock and its data. For
instance, the ScatterLocks are the only ones which can go into the
LOCK_MIX state, and you'll see that that state (unlike all the others)
says that ANY (body) can take a write lock on it.
The format of the lock names is generally either LOCK_<BIG STATE> or
LOCK_<BIG STATE I WAS IN>_<BIG STATE I'M GOING TO>.

> 3, Each lock item can get rdlocks, wrlocks, xlocks and maybe
> remote_wrlocks. It seems that the life cycle of rdlocks, wrlocks and
> xlocks is the same as a MDRequest, is that right? What is the
> difference between these kinds of locks and the states(LOCK_SYNC,
> LOCK_LOCK ,....)?

I assume when you say rdlocks, wrlocks, and xlocks you mean the data
structures associated with an MDRequest? So yes, these are collections
of locks that the MDS needs to get the specified kind of lock on in
order to perform the client's request. There are a whole bunch of lock
states because for the MDS to actually get a write lock, or a read
lock, or an exclusive lock, on a distributed lock can be very
complicated. So there are a bunch of different states to try and let
the MDSes get those locks as efficiently as possible.

> I have read the codes about mds locks for almost one week, but I think
> I missed some key designs or ideas, so the codes is quite hard to
> understand for me.
They were (and are) hard for me too, so you are not alone. Feel free
to ask more specific questions!
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html