On 2023-12-22 17:46, Paul Moore wrote: > On Wed, Dec 20, 2023 at 7:21 AM Juraj Marcin <juraj@xxxxxxxxxxxxxxx> wrote: > > > > Hi Paul > > > > thank you for your valuable feedback. I will incorporate your > > suggestions to my code. I will then report back with benchmark results > > with included optimizations. I suspect, kstrdup() in > > filename_compute_type() might also be part of the reason, the patched > > kernel is few tenths of microsecond slower, so I will also look into > > that. > > Thank you. I did look at the dup'd string for a bit, but didn't see a > relatively quick way to do away with the additional string copy; if > you can come up with something that isn't too ugly I think that would > be great. Hi, I am sorry for a late reply, I did not have much time during university exam period and among other duties. I have managed to get rid of kstrdup() in filename_compute_type() by saving the length of the name in the filename transition key structure. This memory allocation for copy of the string was the reason for the worse osbench results. Reference kernel (case 1) is always the same setup, I just benchmarked it again as part of the automation I have for the comparison. Modified kernel refers to prefix/suffix implementation with optimizations mentioned above each table. 1: Reference kernel (a1fc79343abb), Fedora policy (format v33) 2: Modified kernel, Fedora policy (format v33) 3: Modified kernel, Fedora policy without prefix/suffix rules (format v34) 4: Modified kernel, Fedora policy with generated prefix rules (format v34) With iteration limits based on min/max prefix/suffix length | Mem | Binary | Policy | Create | osbench [2] | Usage | policy | load | tty [1] | create | | size | time | | files | (MiB) | (KiB) | (ms) | (ms/file) | (us/file) ---+-------+--------+---------+-----------+----------- 1 | 407 | 3682 | 47.368 | 2.4376 | 43.9488 | sd=20 | | sd=0.39 | sd=0.050 | sd=0.486 2 | 417 | 3682 | 46.979 | 2.4655 | 44.0856 | sd=17 | | sd=0.32 | sd=0.080 | sd=0.476 3 | 430 | 3682 | 46.892 | 2.4605 | 43.6223 | sd=21 | | sd=0.28 | sd=0.042 | sd=0.374 4 | 417 | 2585 | 42.131 | 2.5331 | 48.7980 | sd=21 | | sd=0.29 | sd=0.041 | sd=0.456 Here, cases 2 and 3 are improved compared to the last patch, as the kstrdup() is also skipped with no prefix rules loaded. Saving the min/max prefix/suffix length does not slow down policy load, as cases 2 and 3 are pretty much the same as case 1 (reference kernel). With iteration limits based on min/max prefix/suffix length and kdstrup() removal | Mem | Binary | Policy | Create | osbench [2] | Usage | policy | load | tty [1] | create | | size | time | | files | (MiB) | (KiB) | (ms) | (ms/file) | (us/file) ---+-------+--------+---------+-----------+----------- 1 | 430 | 3682 | 46.930 | 2.4569 | 43.4431 | sd= 8 | | sd=0.25 | sd=0.043 | sd=0.434 2 | 410 | 3682 | 46.829 | 2.4623 | 43.4112 | sd=20 | | sd=0.27 | sd=0.053 | sd=0.442 3 | 419 | 3682 | 46.823 | 2.4257 | 43.0575 | sd=14 | | sd=0.18 | sd=0.057 | sd=0.416 4 | 420 | 2585 | 42.044 | 2.5028 | 43.7907 | sd=10 | | sd=0.19 | sd=0.067 | sd=0.382 >From these results, it is apparent that removing the string duplication is effective optimization, as in all four cases the osbench and tty create results are close together. Out of curiosity, I also tested the kstrdup() removal alone, without other optimizations that were proposed. | Mem | Binary | Policy | Create | osbench [2] | Usage | policy | load | tty [1] | create | | size | time | | files | (MiB) | (KiB) | (ms) | (ms/file) | (us/file) ---+-------+--------+---------+-----------+----------- 1 | 421 | 3682 | 46.941 | 2.4702 | 43.6063 | sd=21 | | sd=0.23 | sd=0.052 | sd=0.388 2 | 422 | 3682 | 46.736 | 2.4655 | 43.7345 | sd=16 | | sd=0.22 | sd=0.059 | sd=0.516 3 | 425 | 3682 | 46.950 | 2.4445 | 43.6280 | sd=15 | | sd=0.25 | sd=0.045 | sd=0.378 4 | 418 | 2585 | 41.582 | 2.4456 | 43.3526 | sd=18 | | sd=0.14 | sd=0.056 | sd=0.365 It seems that string duplication was the culprit, and additional searches have no measurable impact in the tested cases. However, as saving min/max prefix/suffix length has no impact on the policy load times, it might be a good idea to keep the limiting optimization as it might help with really long filenames. Best regards Juraj Marcin > > -- > paul-moore.com