Re: [PATCH] cifs: perf improvement - use faster macros ALIGN() and round_up()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Aurelien,

On 09/07, Aurélien Aptel wrote:
Hi,

Changes are good from a readability stand point but like the others
I'm very skeptical about the perf improvement claims.

I also am/was. That's why I ran 5 times each test after sending the
patch. For each run, I made sure to have the mount point clean, both
server and client freshly booted, and aside from the patch, the build
env/compile options were exactly the same.

The server had no parallel workload at all as it's only a test server.

Many of these compile to the same division-less instructions
especially if any of the values are known at compile time.

<snip>

@@ -2826,9 +2819,7 @@ SMB2_open_init(struct cifs_tcon *tcon, struct TCP_Server_Info *server,
-               copy_size = uni_path_len;
-               if (copy_size % 8 != 0)
-                       copy_size = roundup(copy_size, 8);
+               copy_size = round_up(uni_path_len, 8);
@@ -4090,7 +4081,7 @@ smb2_new_read_req(void **buf, unsigned int *total_len,
-                       *total_len = DIV_ROUND_UP(*total_len, 8) * 8;
+                       *total_len = ALIGN(*total_len, 8);

These 2 are also hot paths, but skeptical about the optimizations.

As you point out, SMB2_open_init() was indeed the function with greater
improvement, and as per my measurements, the one that actually impacted
the general latency.

I've looked at those macros in Compiler Explorer and sure enough they
compile to the same thing on x86_64.
Even worse, the optimized versions compile with extra instructions for some:

https://godbolt.org/z/z1xhhW9sj

I did the same comparison on a userspace program and got similar
results, but I didn't bother to check the kernel objects as testing it
was quicker to me. But I'll do it today.

I suspect the improvements are more likely to be related to caches,
system load, server load, ...

See above.

You can try to use perf to make a flamegraph and compare.

Cheers,

Not really used to perf, but will check it out, thanks!


Cheers,

Enzo




[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux