"No, this isn't correct. This loop is looking for the *best* group it can find, and your "break" would have it exit the loop as soon as the *first* group that matches the conditions is found." But as we are checking all the groups with the same conditions then how it guarantees better group selection ? As per my understanding we are just wasting time in looping. On Sat, Dec 5, 2015 at 3:24 AM, Andreas Dilger <adilger@xxxxxxxxx> wrote: > >> On Dec 3, 2015, at 12:13 PM, lokesh jaliminche <lokesh.jaliminche@xxxxxxxxx> wrote: >> >> Ohh thanks for the clarification. There is one more thing I would like >> to point out here. >> In the code there is a loop to scan the groups for inode >> alllocation(Inside find_group_orlov function). >> There are some policies for group selection . while scanning the >> groups, it checks for these >> policies to be satisfied. >> If a particular group satisfy these properties it should get selected >> for inode allocation but instead >> it does further lookup in next groups. >> I think there is missing breaking condition. I have added break over >> there and here is the >> patch for that. Any reason for not having break condition over here ? >> >> diff -Nur linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c >> linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c >> --- linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c 2014-04-12 >> 01:20:31.000000000 +0530 >> +++ linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c 2015-11-29 >> 21:36:51.805542209 +0530 >> @@ -529,6 +529,7 @@ >> grp = g; >> ret = 0; >> best_ndir = stats.used_dirs; >> + break; >> } >> if (ret) >> goto fallback; > > No, this isn't correct. This loop is looking for the *best* group it can find, > and your "break" would have it exit the loop as soon as the *first* directory > that matches the conditions is found. Since those conditions are fairly weak, > for example that the group actually has free inodes, and it has better than > average free inodes and blocks, it makes sense to search beyond just the first > matching group. > > That said, it also doesn't make sense to search beyond a "perfect" group that > has no allocated inodes and no allocated blocks, so a break condition could be > added to this loop and make it more efficient, especially for very large > filesystems that have 128k+ groups. > > It should be noted that this part of the algorithm only applies to "top level" > directories (those below the root inode, or with the EXT4_INODE_TOPDIR flag > set, so searching a bit longer for a good group is not a bad idea in this case. > > Cheers, Andreas. > >> Thanks & Regards, >> Lokesh >> >> >> >> On Thu, Dec 3, 2015 at 11:28 PM, Andreas Dilger <adilger@xxxxxxxxx> wrote: >>> On Dec 3, 2015, at 01:07, lokesh jaliminche <lokesh.jaliminche@xxxxxxxxx> wrote: >>>> >>>> Thought of giving more clarification on my question >>>> why group search start is random ? because we can also start search >>>> for valid groups for inode allocation from the start. As this group >>>> search is random inode selection might go to end of groups which >>>> might affect IO performance >>> >>> Starting the inode search at the beginning of the disk each time >>> means that inode allocation will be inefficient because it will search >>> over groups that are mostly or entirely full already. >>> >>> Allocating the new directory in a semi-random group, one that is >>> relatively unused, ensures that new >>> inode and block allocations are relatively efficient afterward. >>> >>> Cheers, Andreas >>> >>>> On Thu, Dec 3, 2015 at 1:14 PM, lokesh jaliminche >>>> <lokesh.jaliminche@xxxxxxxxx> wrote: >>>>> hello folks, >>>>> I am new to ext4 code. I was going through the >>>>> ext4-source for allocation of inode. >>>>> There is one thing that I did not understand while selection of groups >>>>> for inode allocation . I came across this code snippet which is part >>>>> of find_group_orlov function. question is, why group search start is >>>>> random ? >>>>> >>>>> Code snippet: >>>>> ========== >>>>> В·В·В·if (qstr) { >>>>> »·······»·······»·······hinfo.hash_version = LDISKFS_DX_HASH_HALF_MD4; >>>>> »·······»·······»·······hinfo.seed = sbi->s_hash_seed; >>>>> »·······»·······»·······ldiskfsfs_dirhash(qstr->name, qstr->len, &hinfo); >>>>> »·······»·······»·······grp = hinfo.hash; >>>>> »·······»·······} else >>>>> »·······»·······»·······get_random_bytes(&grp, sizeof(grp)); >>>>> »·······»·······parent_group = (unsigned)grp % ngroups; >>>>> »·······»·······for (i = 0; i < ngroups; i++) { >>>>> »·······»·······»·······g = (parent_group + i) % ngroups; >>>>> »·······»·······»·······get_orlov_stats(sb, g, flex_size, &stats); >>>>> »·······»·······»·······if (!stats.free_inodes) >>>>> »·······»·······»·······»·······continue; >>>>> »·······»·······»·······if (stats.used_dirs >= best_ndir) >>>>> »·······»·······»·······»·······continue; >>>>> »·······»·······»·······if (stats.free_inodes < avefreei) >>>>> »·······»·······»·······»·······continue; >>>>> »·······»·······»·······if (stats.free_blocks < avefreeb) >>>>> »·······»·······»·······»·······continue; >>>>> »·······»·······»·······grp = g; >>>>> »·······»·······»·······ret = 0; >>>>> »·······»·······»·······best_ndir = stats.used_dirs; >>>>> »·······»·······} >>>>> >>>>> Thanks & Regards, >>>>> Lokesh >>>> N‹§Іжмrё›yъиљШbІX¬¶З§vШ^–)Ює{.nЗ+‰·ҐЉ{±{ xЉ{ayє К‡Ъ™л,j ўfЈў·hљ‹аz№ ®wҐўё ў·¦j:+v‰ЁЉwиjШm¶џяѕ «‘кзzZ+ѓщљЋЉЭўj"ќъ!¶i > > > Cheers, Andreas > > > > > ��.n��������+%������w��{.n�����{�����ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f