On Mar 26, 2020, at 1:49 PM, harshad shirwadkar <harshadshirwadkar@xxxxxxxxx> wrote: > > On Wed, Mar 25, 2020 at 3:06 AM Andreas Dilger <adilger@xxxxxxxxx> wrote: >> >> On Mar 25, 2020, at 3:37 AM, Harshad Shirwadkar <harshadshirwadkar@xxxxxxxxx> wrote: >>> But note that most of the shrinking happens during last 1-2% deletions >>> in an average case. Therefore, the next step here is to merge dx nodes >>> when possible. That can be achieved by storing the fullness index in >>> htree nodes. But that's an on-disk format change. We can instead build >>> on tooling added by this patch to perform reverse lookup on a dx >>> node and then reading adjacent nodes to check their fullness. >> >> Thank you for updating these patches again. I haven't had a chance to look >> at them yet, but I hope to review the patches in the near future. >> >> As for storing the fullness on disk changing the on-disk format... That is >> true, but the original htree implementation anticipated this and reserved >> space in the htree index to store the fullness, so it would not break the >> ability of older kernels to access directories with the fullness information. >> > Yeah, you are right, good to know that we have bits reserved already > and that wouldn't break older kernels if we use these in future. >> I think if you used just a few bits (maybe just 2) to store: >> 0 = unset (every directory today) >> 1 = under 20% full >> 2 = under 40% full >> 3 = under 60% full >> >> or similar. It doesn't matter if they are more full since they won't be >> candidates for merging, and then lazily update the htree index fullness >> as entries are removed, this will simplify the shrinking process, and will >> avoid the need to repeatedly scan the leaf blocks to see if they are empty >> enough for merging. It wouldn't be any worse *not* to store these values >> on disk after the first time a "0 = unset" entry was found and not merged, >> or setting the fullness on the merged block if it is merged, and running >> "e2fsck -D" can easily update the fullness values. >> >> The benefit of using 20%, 40%, and 60% as the fullness markers is that it >> is possible to either merge adjacent 60% and 40% blocks or alternately a >> 60% and two adjacent 20% blocks. Also, since these values are very coarse >> they would not need to be updated frequently. If the values are slightly >> outdated, then it is again not worse than the "always scan" model (one scan >> and the fullness would be updated), but more efficient than repeat scanning. >> >> Using only two bits for fullness also leaves two bits free for future use. > > Thanks Andreas, that makes sense. This kind of merging will require > lot of tooling provided in this patch - for example swapping out freed > block with last block to not leave any holes. So, my hope is that we > get this patch in first and thereby get a step closer to coalescing > solution. Definitely I *do not* want to block the landing of these initial patches until a "full featured" directory shrinking is complete. These patches at least provide some basic functionality, and will at least shrink a large directory if it becomes totally empty so I'm in favour of that. Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP