Hey thanks for that, I will push a patch for this ASAP :-) Regards, Lokesh On Wed, Dec 16, 2015 at 3:02 AM, Andreas Dilger <adilger@xxxxxxxxx> wrote: > On Dec 15, 2015, at 3:33 AM, lokesh jaliminche <lokesh.jaliminche@xxxxxxxxx> wrote: >> >> "No, this isn't correct. This loop is looking for the *best* group it >> can find, and your "break" would have it exit the loop as soon as the >> *first* group that matches the conditions is found." >> >> But as we are checking all the groups with the same conditions then >> how it guarantees better group selection ? As per my understanding we >> are just wasting time in looping. > > The important part of the loop is ensuring that the selected group is always > improving over the previous one: > > if (le16_to_cpu(desc->bg_used_dirs_count) >= best_ndir) > continue; > > The "best_ndir" value tracks for the best group found so far the number of > used directories in the group, and if the new directory has fewer directories > than the previous "best" directory and still meets all the other criteria > (fewer than average inodes allocated, etc) then the new group will be chosen. > > That said, you are correct that the loop can spend a lot of time searching > needlessly. It would be trivial to add a check at the start of the loop: > > /* the group can't get any better than empty */ > if (desc->bg_free_inodes_count == inodes_per_group && > desc->bg_free_blocks_count == > EXT4_BLOCKS_PER_GROUP(sb)) > goto found; > > This jumps directly to "found" with "desc" and "group" already set, so > there is no need to go through the extra steps of setting "best_desc" and > "best_group" and then break out of the loop just to set "desc" and "group" > again. > > Since you are the one to find this optimization, could you please submit a > patch to this effect so you get the credit. > > Cheers, Andreas > >> On Sat, Dec 5, 2015 at 3:24 AM, Andreas Dilger <adilger@xxxxxxxxx> wrote: >>> >>>> On Dec 3, 2015, at 12:13 PM, lokesh jaliminche <lokesh.jaliminche@xxxxxxxxx> wrote: >>>> >>>> Ohh thanks for the clarification. There is one more thing I would like >>>> to point out here. >>>> In the code there is a loop to scan the groups for inode >>>> alllocation(Inside find_group_orlov function). >>>> There are some policies for group selection . while scanning the >>>> groups, it checks for these >>>> policies to be satisfied. >>>> If a particular group satisfy these properties it should get selected >>>> for inode allocation but instead >>>> it does further lookup in next groups. >>>> I think there is missing breaking condition. I have added break over >>>> there and here is the >>>> patch for that. Any reason for not having break condition over here ? >>>> >>>> diff -Nur linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c >>>> linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c >>>> --- linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c 2014-04-12 >>>> 01:20:31.000000000 +0530 >>>> +++ linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c 2015-11-29 >>>> 21:36:51.805542209 +0530 >>>> @@ -529,6 +529,7 @@ >>>> grp = g; >>>> ret = 0; >>>> best_ndir = stats.used_dirs; >>>> + break; >>>> } >>>> if (ret) >>>> goto fallback; >>> >>> No, this isn't correct. This loop is looking for the *best* group it can find, >>> and your "break" would have it exit the loop as soon as the *first* directory >>> that matches the conditions is found. Since those conditions are fairly weak, >>> for example that the group actually has free inodes, and it has better than >>> average free inodes and blocks, it makes sense to search beyond just the first >>> matching group. >>> >>> That said, it also doesn't make sense to search beyond a "perfect" group that >>> has no allocated inodes and no allocated blocks, so a break condition could be >>> added to this loop and make it more efficient, especially for very large >>> filesystems that have 128k+ groups. >>> >>> It should be noted that this part of the algorithm only applies to "top level" >>> directories (those below the root inode, or with the EXT4_INODE_TOPDIR flag >>> set, so searching a bit longer for a good group is not a bad idea in this case. >>> >>> Cheers, Andreas. >>> >>>> Thanks & Regards, >>>> Lokesh >>>> >>>> >>>> >>>> On Thu, Dec 3, 2015 at 11:28 PM, Andreas Dilger <adilger@xxxxxxxxx> wrote: >>>>> On Dec 3, 2015, at 01:07, lokesh jaliminche <lokesh.jaliminche@xxxxxxxxx> wrote: >>>>>> >>>>>> Thought of giving more clarification on my question >>>>>> why group search start is random ? because we can also start search >>>>>> for valid groups for inode allocation from the start. As this group >>>>>> search is random inode selection might go to end of groups which >>>>>> might affect IO performance >>>>> >>>>> Starting the inode search at the beginning of the disk each time >>>>> means that inode allocation will be inefficient because it will search >>>>> over groups that are mostly or entirely full already. >>>>> >>>>> Allocating the new directory in a semi-random group, one that is >>>>> relatively unused, ensures that new >>>>> inode and block allocations are relatively efficient afterward. >>>>> >>>>> Cheers, Andreas >>>>> >>>>>> On Thu, Dec 3, 2015 at 1:14 PM, lokesh jaliminche >>>>>> <lokesh.jaliminche@xxxxxxxxx> wrote: >>>>>>> hello folks, >>>>>>> I am new to ext4 code. I was going through the >>>>>>> ext4-source for allocation of inode. >>>>>>> There is one thing that I did not understand while selection of groups >>>>>>> for inode allocation . I came across this code snippet which is part >>>>>>> of find_group_orlov function. question is, why group search start is >>>>>>> random ? >>>>>>> >>>>>>> Code snippet: >>>>>>> ========== >>>>>>> В·В·В·if (qstr) { >>>>>>> »·······»·······»·······hinfo.hash_version = LDISKFS_DX_HASH_HALF_MD4; >>>>>>> »·······»·······»·······hinfo.seed = sbi->s_hash_seed; >>>>>>> »·······»·······»·······ldiskfsfs_dirhash(qstr->name, qstr->len, &hinfo); >>>>>>> »·······»·······»·······grp = hinfo.hash; >>>>>>> »·······»·······} else >>>>>>> »·······»·······»·······get_random_bytes(&grp, sizeof(grp)); >>>>>>> »·······»·······parent_group = (unsigned)grp % ngroups; >>>>>>> »·······»·······for (i = 0; i < ngroups; i++) { >>>>>>> »·······»·······»·······g = (parent_group + i) % ngroups; >>>>>>> »·······»·······»·······get_orlov_stats(sb, g, flex_size, &stats); >>>>>>> »·······»·······»·······if (!stats.free_inodes) >>>>>>> »·······»·······»·······»·······continue; >>>>>>> »·······»·······»·······if (stats.used_dirs >= best_ndir) >>>>>>> »·······»·······»·······»·······continue; >>>>>>> »·······»·······»·······if (stats.free_inodes < avefreei) >>>>>>> »·······»·······»·······»·······continue; >>>>>>> »·······»·······»·······if (stats.free_blocks < avefreeb) >>>>>>> »·······»·······»·······»·······continue; >>>>>>> »·······»·······»·······grp = g; >>>>>>> »·······»·······»·······ret = 0; >>>>>>> »·······»·······»·······best_ndir = stats.used_dirs; >>>>>>> »·······»·······} >>>>>>> >>>>>>> Thanks & Regards, >>>>>>> Lokesh >>>>>> N‹§Іжмrё›yъиљШbІX¬¶З§vШ^–)Ює{.nЗ+‰·ҐЉ{±{ xЉ{ayє К‡Ъ™л,j ўfЈў·hљ‹аz№ ®wҐўё ў·¦j:+v‰ЁЉwиjШm¶џяѕ «‘кзzZ+ѓщљЋЉЭўj"ќъ!¶i >>> >>> >>> Cheers, Andreas >>> >>> >>> >>> >>> > > > Cheers, Andreas > > > > > ��.n��������+%������w��{.n�����{�����ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f