On 7/2/21 7:46 AM, Patrick Donnelly wrote:
On Wed, Jun 30, 2021 at 11:18 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
And just now I have run by adding the time stamp:
fd = open("/path")
fopenat(fd, "foo")
renameat(fd, "foo", fd, "bar")
fstat(fd)
fsync(fd)
lxb ----- before renameat ---> Current time is Thu Jul 1 13:28:52 2021
lxb ----- after renameat ---> Current time is Thu Jul 1 13:28:52 2021
lxb ----- before fstat ---> Current time is Thu Jul 1 13:28:52 2021
lxb ----- after fstat ---> Current time is Thu Jul 1 13:28:52 2021
lxb ----- before fsync ---> Current time is Thu Jul 1 13:28:52 2021
lxb ----- after fsync ---> Current time is Thu Jul 1 13:28:56 2021
We can see that even after 'fstat(fd)', the 'fsync(fd)' still will wait around 4s.
Why your test worked it should be the MDS's tick thread and the 'fstat(fd)' were running almost simultaneously sometimes, I also could see the 'fsync(fd)' finished very fast sometimes:
lxb ----- before renameat ---> Current time is Thu Jul 1 13:29:51 2021
lxb ----- after renameat ---> Current time is Thu Jul 1 13:29:51 2021
lxb ----- before fstat ---> Current time is Thu Jul 1 13:29:51 2021
lxb ----- after fstat ---> Current time is Thu Jul 1 13:29:51 2021
lxb ----- before fsync ---> Current time is Thu Jul 1 13:29:51 2021
lxb ----- after fsync ---> Current time is Thu Jul 1 13:29:51 2021
Actually, I did a lot more testing on this. It's a unique behavior of
the directory is /. You will see a getattr force a flush of the
journal:
2021-07-01T23:42:18.095+0000 7fcc7741c700 7 mds.0.server
dispatch_client_request client_request(client.4257:74 getattr
pAsLsXsFs #0x1 2021-07-01T23:42:18.095884+0000 caller_uid=1147,
caller_gid=1147{1000,1147,}) v5
...
2021-07-01T23:42:18.096+0000 7fcc7741c700 10 mds.0.locker nudge_log
(ifile mix->sync w=2) on [inode 0x1 [...2,head] / auth v34 pv39 ap=6
snaprealm=0x564734479600 DIRTYPARENT f(v0
m2021-07-01T23:38:00.418466+0000 3=1+2) n(v6
rc2021-07-01T23:38:15.692076+0000 b65536 7=2+5)/n(v0
rc2021-07-01T19:31:40.924877+0000 1=0+1) (iauth sync r=1) (isnap sync
r=4) (inest mix w=3) (ipolicy sync r=2) (ifile mix->sync w=2)
(iversion lock w=3) caps={4257=pAsLsXs/-@32} | dirtyscattered=0
request=1 lock=6 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1
authpin=1 0x56473913a580]
You don't see that getattr for directories other than root. That's
probably because the client has been issued more caps than what the
MDS is willing to normally hand out for root.
For the root dir, when doing the 'rename' the wrlock_start('ifile lock')
will change the lock state 'SYNC' --> 'MIX'. Then the inode 0x1 will
issue 'pAsLsXs' to clients. So when the client sends a 'getattr' request
with caps 'AsXsFs' wanted, the mds will try to switch the 'ifile lock'
state back to 'SYNC' to get the 'Fs' cap. Since the rdlock_start('ifile
lock') needs to do the lock state transition, it will wait and trigger
the 'nudge_log'.
The reason why will wrlock_start('ifile lock') change the lock state
'SYNC' --> 'MIX' above is that the inode '0x1' has subtree, if my
understanding is correct so for the root dir it should be very probably
shared by multiple MDSes and it chooses to switch to MIX.
This is why the root dir will work when we send a 'getattr' request.
For the none root directories, it will bump to loner and then the
'ifile/iauth/ixattr locks' state switched to EXCL instead, for this lock
state it will issue 'pAsxLsXsxFsx' cap. So when doing the
'getattr(AsXsFs)' in client, it will do nothing since it's already
issued the caps needed. This is why we couldn't see the getattr request
was sent out.
Even we 'forced' to call the getattr, it can get the rdlock immediately
and no need to gather or do lock state transition, so no 'nudge_log' was
called. Since in case if the none directories are in loner mode and the
locks will be in 'EXCL' state, so it will allow 'pAsxLsXsxFsxrwb' as
default, then even we 'forced' call the getattr('pAsxLsXsxFsxrwb') in
fsync, in the MDS side it still won't do the lock states transition.
I'm not really sure why there is a difference. I even experimented
with redundant getattr ("forced") calls to cause a journal flush on
non-root directories but didn't get anywhere. Maybe you can
investigate further? It'd be optimal if we could nudge the log just by
doing a getattr.
So in the above case, from my tests and reading the Locker code, I
didn't figure out how can the getattr could work for this issue yet.
Patrick,
Did I miss something about the Lockers ?