On Fri, Jul 2, 2021 at 6:17 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > On 7/2/21 7:46 AM, Patrick Donnelly wrote: > > On Wed, Jun 30, 2021 at 11:18 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > >> And just now I have run by adding the time stamp: > >> > >>> fd = open("/path") > >>> fopenat(fd, "foo") > >>> renameat(fd, "foo", fd, "bar") > >>> fstat(fd) > >>> fsync(fd) > >> lxb ----- before renameat ---> Current time is Thu Jul 1 13:28:52 2021 > >> lxb ----- after renameat ---> Current time is Thu Jul 1 13:28:52 2021 > >> lxb ----- before fstat ---> Current time is Thu Jul 1 13:28:52 2021 > >> lxb ----- after fstat ---> Current time is Thu Jul 1 13:28:52 2021 > >> lxb ----- before fsync ---> Current time is Thu Jul 1 13:28:52 2021 > >> lxb ----- after fsync ---> Current time is Thu Jul 1 13:28:56 2021 > >> > >> We can see that even after 'fstat(fd)', the 'fsync(fd)' still will wait around 4s. > >> > >> Why your test worked it should be the MDS's tick thread and the 'fstat(fd)' were running almost simultaneously sometimes, I also could see the 'fsync(fd)' finished very fast sometimes: > >> > >> lxb ----- before renameat ---> Current time is Thu Jul 1 13:29:51 2021 > >> lxb ----- after renameat ---> Current time is Thu Jul 1 13:29:51 2021 > >> lxb ----- before fstat ---> Current time is Thu Jul 1 13:29:51 2021 > >> lxb ----- after fstat ---> Current time is Thu Jul 1 13:29:51 2021 > >> lxb ----- before fsync ---> Current time is Thu Jul 1 13:29:51 2021 > >> lxb ----- after fsync ---> Current time is Thu Jul 1 13:29:51 2021 > > Actually, I did a lot more testing on this. It's a unique behavior of > > the directory is /. You will see a getattr force a flush of the > > journal: > > > > 2021-07-01T23:42:18.095+0000 7fcc7741c700 7 mds.0.server > > dispatch_client_request client_request(client.4257:74 getattr > > pAsLsXsFs #0x1 2021-07-01T23:42:18.095884+0000 caller_uid=1147, > > caller_gid=1147{1000,1147,}) v5 > > ... > > 2021-07-01T23:42:18.096+0000 7fcc7741c700 10 mds.0.locker nudge_log > > (ifile mix->sync w=2) on [inode 0x1 [...2,head] / auth v34 pv39 ap=6 > > snaprealm=0x564734479600 DIRTYPARENT f(v0 > > m2021-07-01T23:38:00.418466+0000 3=1+2) n(v6 > > rc2021-07-01T23:38:15.692076+0000 b65536 7=2+5)/n(v0 > > rc2021-07-01T19:31:40.924877+0000 1=0+1) (iauth sync r=1) (isnap sync > > r=4) (inest mix w=3) (ipolicy sync r=2) (ifile mix->sync w=2) > > (iversion lock w=3) caps={4257=pAsLsXs/-@32} | dirtyscattered=0 > > request=1 lock=6 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 > > authpin=1 0x56473913a580] > > > > You don't see that getattr for directories other than root. That's > > probably because the client has been issued more caps than what the > > MDS is willing to normally hand out for root. > > For the root dir, when doing the 'rename' the wrlock_start('ifile lock') > will change the lock state 'SYNC' --> 'MIX'. Then the inode 0x1 will > issue 'pAsLsXs' to clients. So when the client sends a 'getattr' request > with caps 'AsXsFs' wanted, the mds will try to switch the 'ifile lock' > state back to 'SYNC' to get the 'Fs' cap. Since the rdlock_start('ifile > lock') needs to do the lock state transition, it will wait and trigger > the 'nudge_log'. > > The reason why will wrlock_start('ifile lock') change the lock state > 'SYNC' --> 'MIX' above is that the inode '0x1' has subtree, if my > understanding is correct so for the root dir it should be very probably > shared by multiple MDSes and it chooses to switch to MIX. > > This is why the root dir will work when we send a 'getattr' request. > > > For the none root directories, it will bump to loner and then the > 'ifile/iauth/ixattr locks' state switched to EXCL instead, for this lock > state it will issue 'pAsxLsXsxFsx' cap. So when doing the > 'getattr(AsXsFs)' in client, it will do nothing since it's already > issued the caps needed. This is why we couldn't see the getattr request > was sent out. > > Even we 'forced' to call the getattr, it can get the rdlock immediately > and no need to gather or do lock state transition, so no 'nudge_log' was > called. Since in case if the none directories are in loner mode and the > locks will be in 'EXCL' state, so it will allow 'pAsxLsXsxFsxrwb' as > default, then even we 'forced' call the getattr('pAsxLsXsxFsxrwb') in > fsync, in the MDS side it still won't do the lock states transition. > > > > > > I'm not really sure why there is a difference. I even experimented > > with redundant getattr ("forced") calls to cause a journal flush on > > non-root directories but didn't get anywhere. Maybe you can > > investigate further? It'd be optimal if we could nudge the log just by > > doing a getattr. > > So in the above case, from my tests and reading the Locker code, I > didn't figure out how can the getattr could work for this issue yet. > > Patrick, > > Did I miss something about the Lockers ? No, your analysis looks right. Thanks. I suppose this flush_mdlog message is the best tool we have to fix this. -- Patrick Donnelly, Ph.D. He / Him / His Principal Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D