Hi Stefan, On Tue, Jul 2, 2024 at 4:16 PM Stefan Kooman <stefan@xxxxxx> wrote: > > Hi Venky, > > On 02-07-2024 09:45, Venky Shankar wrote: > > Hi Stefan, > > > > On Mon, Jul 1, 2024 at 2:30 PM Stefan Kooman <stefan@xxxxxx> wrote: > >> > >> Hi Dietmar, > >> > >> On 29-06-2024 10:50, Dietmar Rieder wrote: > >>> Hi all, > >>> > >>> finally we were able to repair the filesystem and it seems that we did > >>> not lose any data. Thanks for all suggestions and comments. > >>> > >>> Here is a short summary of our journey: > >> > >> Thanks for writing this up. This might be useful for someone in the future. > >> > >> --- snip --- > >> > >>> X. Conclusion: > >>> > >>> If we would have be aware of the bug and its mitigation we would have > >>> saved a lot of downtime and some nerves. > >>> > >>> Is there an obvious place that I missed were such known issues are > >>> prominently made public? (The bug tracker maybe, but I think it is easy > >>> to miss the important among all others) > >> > >> > >> Not that I know of. But changes in behavior of Ceph (daemons) and or > >> Ceph kernels would be good to know about indeed. I follow the > >> ceph-kernel mailing list to see what is going on with the development of > >> kernel CephFS. And there is a thread about reverting the PR that Enrico > >> linked to [1], here the last mail in that thread from Venky to Ilya [2]: > >> > >> "Hi Ilya, > >> > >> After some digging and talking to Jeff, I figured that it's possible > >> to disable async dirops from the mds side by setting > >> `mds_client_delegate_inos_pct` config to 0: > >> > >> - name: mds_client_delegate_inos_pct > >> type: uint > >> level: advanced > >> desc: percentage of preallocated inos to delegate to client > >> default: 50 > >> services: > >> - mds > >> > >> So, I guess this patch is really not required. We can suggest this > >> config update to users and document it for now. We lack tests with > >> this config disabled, so I'll be adding the same before recommending > >> it out. Will keep you posted." > >> > >> However, I have not seen any update after this. So apparently it is > >> possible to disable this preallocate behavior globally by disabling it > >> on the MDS. But there are (were) no MDS tests with this option disabled > >> (I guess a percentage of "0" would disable it). So I'm not sure it's > >> safe to disable it, and what would happen if you disable this on the MDS > >> when there are clients actually using preallocated inodes. I have added > >> Venky in the CC so I hope he can give us an update about the recommended > >> way(s) of disabling preallocated inodes > > > > It's safe to disable preallocation by setting > > `mds_client_delegate_inos_pct = 0'. Once disabled, the MDS will not > > delegate (preallocated) inode ranges to clients, effectively disabling > > async dirops. We have seen users running with this config (and using > > the `wsync' mount option in the kernel driver - although setting both > > isn't really required IMO, Xiubo?) and reporting stabe file system > > operations. > > Can this be done live, as in via `ceph config set mds > mds_client_delegate_inos_pct = 0` and / or `ceph daemon mds.$hostname > config set mds_client_delegate_inos_pct = 0`? Yes - use `config set`. > Has that been tested? Somewhat yes - it's safe to disable the config. > Or > is it safer to do this by restarting the MDS? I wonder how the MDS > handles cases where inodes are already delegated to the client and has > to transition to full sync behavior again. The MDS would use the updated config on subsequent client requests. The already preallocated inodes would be continued to be used by the client till exhaustion. > > > > > As far as tests are concerned, the way forward is to have a shot > > reproducing this in our test lab. We have a tracker for this: > > > > https://tracker.ceph.com/issues/66250 > > > > It's likely that a combination of having the MDS preallocate inodes > > and delegate a percentage of those frequently to clients is causing > > this bug. Furthermore, an enhancement is proposed (for the shorter > > term) to not crash the MDS, but to blocklist the client that's holding > > the "problematic" preallocated inode range. That way, the file system > > isn't totally unavailable when such a problem occurs (the client would > > have to be remounted though, but that's a lesser pain than going > > through the disaster recovery steps). > > Gr. Stefan > -- Cheers, Venky _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx