Re: [PATCH 4/5] ceph: flush the mdlog before waiting on unsafe reqs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 2, 2021 at 6:17 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
>
>
> On 7/2/21 7:46 AM, Patrick Donnelly wrote:
> > On Wed, Jun 30, 2021 at 11:18 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> >> And just now I have run by adding the time stamp:
> >>
> >>> fd = open("/path")
> >>> fopenat(fd, "foo")
> >>> renameat(fd, "foo", fd, "bar")
> >>> fstat(fd)
> >>> fsync(fd)
> >> lxb ----- before renameat ---> Current time is Thu Jul  1 13:28:52 2021
> >> lxb ----- after renameat ---> Current time is Thu Jul  1 13:28:52 2021
> >> lxb ----- before fstat ---> Current time is Thu Jul  1 13:28:52 2021
> >> lxb ----- after fstat ---> Current time is Thu Jul  1 13:28:52 2021
> >> lxb ----- before fsync ---> Current time is Thu Jul  1 13:28:52 2021
> >> lxb ----- after fsync ---> Current time is Thu Jul  1 13:28:56 2021
> >>
> >> We can see that even after 'fstat(fd)', the 'fsync(fd)' still will wait around 4s.
> >>
> >> Why your test worked it should be the MDS's tick thread and the 'fstat(fd)' were running almost simultaneously sometimes, I also could see the 'fsync(fd)' finished very fast sometimes:
> >>
> >> lxb ----- before renameat ---> Current time is Thu Jul  1 13:29:51 2021
> >> lxb ----- after renameat ---> Current time is Thu Jul  1 13:29:51 2021
> >> lxb ----- before fstat ---> Current time is Thu Jul  1 13:29:51 2021
> >> lxb ----- after fstat ---> Current time is Thu Jul  1 13:29:51 2021
> >> lxb ----- before fsync ---> Current time is Thu Jul  1 13:29:51 2021
> >> lxb ----- after fsync ---> Current time is Thu Jul  1 13:29:51 2021
> > Actually, I did a lot more testing on this. It's a unique behavior of
> > the directory is /. You will see a getattr force a flush of the
> > journal:
> >
> > 2021-07-01T23:42:18.095+0000 7fcc7741c700  7 mds.0.server
> > dispatch_client_request client_request(client.4257:74 getattr
> > pAsLsXsFs #0x1 2021-07-01T23:42:18.095884+0000 caller_uid=1147,
> > caller_gid=1147{1000,1147,}) v5
> > ...
> > 2021-07-01T23:42:18.096+0000 7fcc7741c700 10 mds.0.locker nudge_log
> > (ifile mix->sync w=2) on [inode 0x1 [...2,head] / auth v34 pv39 ap=6
> > snaprealm=0x564734479600 DIRTYPARENT f(v0
> > m2021-07-01T23:38:00.418466+0000 3=1+2) n(v6
> > rc2021-07-01T23:38:15.692076+0000 b65536 7=2+5)/n(v0
> > rc2021-07-01T19:31:40.924877+0000 1=0+1) (iauth sync r=1) (isnap sync
> > r=4) (inest mix w=3) (ipolicy sync r=2) (ifile mix->sync w=2)
> > (iversion lock w=3) caps={4257=pAsLsXs/-@32} | dirtyscattered=0
> > request=1 lock=6 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1
> > authpin=1 0x56473913a580]
> >
> > You don't see that getattr for directories other than root. That's
> > probably because the client has been issued more caps than what the
> > MDS is willing to normally hand out for root.
>
> For the root dir, when doing the 'rename' the wrlock_start('ifile lock')
> will change the lock state 'SYNC' --> 'MIX'. Then the inode 0x1 will
> issue 'pAsLsXs' to clients. So when the client sends a 'getattr' request
> with caps 'AsXsFs' wanted, the mds will try to switch the 'ifile lock'
> state back to 'SYNC' to get the 'Fs' cap. Since the rdlock_start('ifile
> lock') needs to do the lock state transition, it will wait and trigger
> the 'nudge_log'.
>
> The reason why will wrlock_start('ifile lock') change the lock state
> 'SYNC' --> 'MIX' above is that the inode '0x1' has subtree, if my
> understanding is correct so for the root dir it should be very probably
> shared by multiple MDSes and it chooses to switch to MIX.
>
> This is why the root dir will work when we send a 'getattr' request.
>
>
> For the none root directories, it will bump to loner and then the
> 'ifile/iauth/ixattr locks' state switched to EXCL instead, for this lock
> state it will issue 'pAsxLsXsxFsx' cap. So when doing the
> 'getattr(AsXsFs)' in client, it will do nothing since it's already
> issued the caps needed. This is why we couldn't see the getattr request
> was sent out.
>
> Even we 'forced' to call the getattr, it can get the rdlock immediately
> and no need to gather or do lock state transition, so no 'nudge_log' was
> called. Since in case if the none directories are in loner mode and the
> locks will be in 'EXCL' state, so it will allow 'pAsxLsXsxFsxrwb' as
> default, then even we 'forced' call the getattr('pAsxLsXsxFsxrwb') in
> fsync, in the MDS side it still won't do the lock states transition.
>
>
> >
> > I'm not really sure why there is a difference. I even experimented
> > with redundant getattr ("forced") calls to cause a journal flush on
> > non-root directories but didn't get anywhere. Maybe you can
> > investigate further? It'd be optimal if we could nudge the log just by
> > doing a getattr.
>
> So in the above case, from my tests and reading the Locker code, I
> didn't figure out how can the getattr could work for this issue yet.
>
> Patrick,
>
> Did I miss something about the Lockers ?

No, your analysis looks right. Thanks.

I suppose this flush_mdlog message is the best tool we have to fix this.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux