Re: [QUESTION] about the freelist allocator in XFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 11, 2016 at 7:22 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> [please don't top post]
>
> On Fri, Jul 08, 2016 at 01:48:43PM +0800, Kaho Ng wrote:
>> On Fri, Jul 8, 2016 at 6:28 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> > On Thu, Jul 07, 2016 at 07:01:35PM +0800, Kaho Ng wrote:
>> >> I am trying to investigate how freelist allocator in xfs interacts
>> >> with freespace B+Tree allocator.
>> >> First I prepared a patch
>> >> <https://gist.github.com/22ffca35929e67c08759b057779b7566> on
>> >> linux-source/fs/xfs/libxfs/xfs_alloc.c to print debugging messages
>> >> (The kernel version used is linux-3.10.0-327.22.2.el7).
>> > ......
>> >> When reading the log output
>> >> <https://gist.github.com/890076405e1c13c0a952a579e25e6afe> , I
>> >> realised that there is no B+Tree split
>> >> triggered by xfs_alloc_fix_freelist() when calling xfs_free_extent().
>> >> Isn't B+Tree split possible in by-size B+Tree even when truncating a
>> >> longer freespace record to shorter one? But what I found in the log is
>> >> only a few tree shrinks... And when reading the source code of
>> >> freespace allocator I found that a B+Tree growth in this case is
>> >> impossible at least...
>> >
>> > args->isfl doesn't mean what you think it means.
>> >
>> > args->isfl is only set when moving blocks from the freespace btree
>> > to the AGFL, which only occurs when a previous operation allocated a
>> > new freespace btree block and depleted the current freelist. i.e.
>> > "AG Free List" != "AG freespace btree" - they are different
>> > structures on disk...
>> >
>> > And when you consider that a freelist refill can only remove records
>> > from the the freespace btree, it's should be clear that a btree
>> > split won't occur during a freelist refill...
>>
>> Hmm, wouldn't xfs_alloc_ag_vextent_size() first remove the free extent
>> record, and insert a new extent record into the freespace by-size
>> btree if the found free extent record is longer than args->maxlen?
>
> Good, you went and looked to verify what I said(*).  Indeed, when
> you read xfs_alloc_fixup_trees() where the two trees are modified
> after an extent has been selected for allocation:
>
> ...
>         * Delete the entry from the by-size btree.
> ...
>         * Add new by-size btree entry(s).
> ...
>         * Fix up the by-block btree entry(s)
>
> It's pretty clear what the answer is. IOWs, yes, you're correct - we
> only ever modify or delete records from the by-bno freespace tree,
> but the by-size tree does do a delete and insert and so can
> split.(**)
>
> So, while it is possible for a split to occur, you still didn't see
> one in your test. That's because a split is rather unlikely in your
> scenario because once we have a large enough freespace btree records
> stored for a split to occur, we almost always have exact matches in
> the by-count tree for freelist fills and so record deletes are the
> usual operationon the by-count tree.
>
> And even if we are doing a by-count insert, the chances the target
> block for the insert is full is quite small, as are the chances the
> target block for the insert is different to the block the original
> record was deleted from (consider ordering, what record index the
> initial lookup returns and that freelist fills usually only require
> a block or two to be allocated).
>
> Cheers,
>
> Dave.
>
> (*) That was what my answer was aimed at getting you to do as I get
> a lot of timewasters asking me via direct email to answer their
> homework questions for them. Anyone who is trying to understand the
> basics of the freespace btrees should have realised the by-count
> tree needs a delete/insert operation pair if we modify the length of
> a freespace record.
>
> (**) The lesson here is this: trust what people say, but always
> verify it when you can. e.g. I now know you'll look at the
> code to try to understand answers you are given, so my time
> answering your questions is not going to be wasted.
>
> --
> Dave Chinner
> david@xxxxxxxxxxxxx

Thanks for the detailed explaination!

Just wonders why we prefer failing the request of refilling freelist
with XFS_WANT_CORRUPTED_RETURN(mp, i == 1) in some rare case, rather
than returning NULLAGBLOCK and allowing the loop in
xfs_alloc_ag_vextent_size() to try xfs_alloc_ag_vextent_small()... In
such corner case there will always be a lot of small extents at the
front of the by-count tree, and any truncation changes to the first
entry in the tree will not result in tree splits and triggering
assertion failure.

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux