[RFC]

"Kirill Kuvaldin" <kirill.kuvaldin@xxxxxxxxx> · Mon, 20 Aug 2007 00:59:10 +0400

Hi,

I tried running oprofile to identify performance bottlenecks of the linux
cifs client. I was actually doing the large read from the samba server to
the client machine over the gigabit ethernet connection. The results
of running oprofile indicated that nearly 70% of samples were attributed
to the cifs_readpages() function:

of 0x00 (Unhalted core cycles) count 60000
warning: could not check that the binary file /cifs has not been
modified since the profile was taken. Results may be inaccurate.
samples  %        symbol name
17690    68.5951  cifs_readpages
1111      4.3080  cifs_demultiplex_thread
850       3.2960  cifs_writepages
768       2.9780  is_valid_oplock_break
747       2.8966  cifs_closedir
464       1.7992  SendReceive2
338       1.3106  sesInfoFree
255       0.9888  DeleteMidQEntry
255       0.9888  allocate_mid
237       0.9190  decode_negTokenInit
212       0.8221  SendReceive
212       0.8221  wait_for_response
184       0.7135  cifs_fsync
180       0.6980  header_assemble
168       0.6514  CIFSSMBRead
161       0.6243  cifs_close
139       0.5390  cifs_NTtimeToUnix
...

Looking further into "opannotate --assembly" results, I noticed that
virtually all sample hits were attributed to
the "rep movsl %ds:(%esi),%es:(%edi)" instruction

               :   129d9:       mov    0x40(%esp),%esi
 17187 66.6447 :   129dd:       rep movsl %ds:(%esi),%es:(%edi)
     4  0.0155 :   129df:       subl   $0x1000,0x44(%esp)

which corresponds to the "memcpy(target, data, PAGE_CACHE_SIZE);" line of
the cifs_copy_cache_pages() function, which was inlined by gcc.

What this seemed to mean was that we're doing memcpy most of the cifs
running time, copying the data from the temporary buffer, allocated from
cifs_demultiplex_thread, to the page cache.

My first thought was that if we managed to avoid doing this unnecessary
copy and read directly from the socket to the page cache, there would be a
performance boost on reads. I tried modifying the cifs source code to
eliminate that memcpy -- in fact, I only commented out the memcpy line of
cifs_copy_cache_pages() just to have an idea how big can be a performance
win if we would eliminate an unnecessary copy without updating actual
pages... Surprisingly enough, I haven't noticed big differences between
unmodified and modified versions (the latter was about 3-5% faster)
on copying a big file on the same setup.

Are the results of oprofile inaccurate in some way or am I missing
something important?
The only instruction (memcpy) taking the most of cifs running time seems
really odd...

        Thanks,
        Kirill
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html