Re: Regarding random grouop search start for allocation of inode.

lokesh jaliminche <lokesh.jaliminche@xxxxxxxxx> · Tue, 15 Dec 2015 16:03:37 +0530

"No, this isn't correct.  This loop is looking for the *best* group it
can find, and your "break" would have it exit the loop as soon as the
*first* group that matches the conditions is found."

But as we are checking  all the groups with the same conditions then
how it guarantees better group selection ? As per my understanding  we
are just wasting time in looping.

On Sat, Dec 5, 2015 at 3:24 AM, Andreas Dilger <adilger@xxxxxxxxx> wrote:
>
>> On Dec 3, 2015, at 12:13 PM, lokesh jaliminche <lokesh.jaliminche@xxxxxxxxx> wrote:
>>
>> Ohh thanks for the clarification. There is one more thing I would like
>> to point out here.
>> In the code there  is a loop to scan the groups for inode
>> alllocation(Inside find_group_orlov function).
>> There are some policies for group selection . while scanning the
>> groups, it checks for these
>> policies to be satisfied.
>> If a particular group satisfy these properties it should get selected
>> for inode allocation but instead
>> it does further lookup in next groups.
>> I think there is missing breaking condition. I have added break over
>> there and here is the
>> patch for that. Any reason for not having break condition over here ?
>>
>> diff -Nur linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
>> linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
>> --- linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2014-04-12
>> 01:20:31.000000000 +0530
>> +++ linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2015-11-29
>> 21:36:51.805542209 +0530
>> @@ -529,6 +529,7 @@
>>             grp = g;
>>             ret = 0;
>>             best_ndir = stats.used_dirs;
>> +            break;
>>         }
>>         if (ret)
>>         goto fallback;
>
> No, this isn't correct.  This loop is looking for the *best* group it can find,
> and your "break" would have it exit the loop as soon as the *first* directory
> that matches the conditions is found.  Since those conditions are fairly weak,
> for example that the group actually has free inodes, and it has better than
> average free inodes and blocks, it makes sense to search beyond just the first
> matching group.
>
> That said, it also doesn't make sense to search beyond a "perfect" group that
> has no allocated inodes and no allocated blocks, so a break condition could be
> added to this loop and make it more efficient, especially for very large
> filesystems that have 128k+ groups.
>
> It should be noted that this part of the algorithm only applies to "top level"
> directories (those below the root inode, or with the EXT4_INODE_TOPDIR flag
> set, so searching a bit longer for a good group is not a bad idea in this case.
>
> Cheers, Andreas.
>
>> Thanks & Regards,
>>   Lokesh
>>
>>
>>
>> On Thu, Dec 3, 2015 at 11:28 PM, Andreas Dilger <adilger@xxxxxxxxx> wrote:
>>> On Dec 3, 2015, at 01:07, lokesh jaliminche <lokesh.jaliminche@xxxxxxxxx> wrote:
>>>>
>>>> Thought of giving more clarification on my question
>>>> why group search start is random ? because we can also start  search
>>>> for valid groups for inode allocation from the start. As this group
>>>> search is random  inode selection might go to end of groups which
>>>> might affect IO performance
>>>
>>> Starting the inode search at the beginning of the disk each time
>>> means that inode allocation will be inefficient because it will search
>>> over groups that are mostly or entirely full already.
>>>
>>> Allocating the new directory in a semi-random group, one that is
>>> relatively unused, ensures that new
>>> inode and block allocations are relatively efficient afterward.
>>>
>>> Cheers, Andreas
>>>
>>>> On Thu, Dec 3, 2015 at 1:14 PM, lokesh jaliminche
>>>> <lokesh.jaliminche@xxxxxxxxx> wrote:
>>>>> hello folks,
>>>>>               I am new to ext4 code. I was going through the
>>>>> ext4-source for allocation of inode.
>>>>> There is one thing that I did not understand while selection of groups
>>>>> for inode allocation . I came across this code snippet which is part
>>>>> of find_group_orlov function. question is, why group search start is
>>>>> random ?
>>>>>
>>>>> Code snippet:
>>>>> ==========
>>>>> В·В·В·if (qstr) {
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·hinfo.hash_version = LDISKFS_DX_HASH_HALF_MD4;
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·hinfo.seed = sbi->s_hash_seed;
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·ldiskfsfs_dirhash(qstr->name, qstr->len, &hinfo);
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·grp = hinfo.hash;
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·} else
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·get_random_bytes(&grp, sizeof(grp));
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·parent_group = (unsigned)grp % ngroups;
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·for (i = 0; i < ngroups; i++) {
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·g = (parent_group + i) % ngroups;
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·get_orlov_stats(sb, g, flex_size, &stats);
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·if (!stats.free_inodes)
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·continue;
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·if (stats.used_dirs >= best_ndir)
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·continue;
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·if (stats.free_inodes < avefreei)
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·continue;
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·if (stats.free_blocks < avefreeb)
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·continue;
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·grp = g;
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·ret = 0;
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·best_ndir = stats.used_dirs;
>>>>> В»В·В·В·В·В·В·В·В»В·В·В·В·В·В·В·}
>>>>>
>>>>> Thanks & Regards,
>>>>> Lokesh
>>>> N‹§Іжмrё›yъиљШbІX¬¶З§vШ^–)Ює{.nЗ+‰·ҐЉ{±{ xЉ{ayє К‡Ъ™л,j ўfЈў·hљ‹аz№ ®wҐўё ў·¦j:+v‰ЁЉwиjШm¶џяѕ «‘кзzZ+ѓщљЋЉЭўj"ќъ!¶i
>
>
> Cheers, Andreas
>
>
>
>
>
��.n��������+%������w��{.n�����{�����ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f