RE: queue_transaction interface + unique_ptr + performance

"Piotr.Dalek@xxxxxxxxxxxxxx" <Piotr.Dalek@xxxxxxxxxxxxxx> · Thu, 3 Dec 2015 08:42:16 +0000

> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Somnath Roy
> Sent: Thursday, December 03, 2015 3:13 AM
> 
> *Also*, in this way we are unnecessary adding another smart pointer
> overhead in the Ceph IO path.
> As I communicated sometimes back (probably 2 years now :-) ) in the
> community, profiler is showing these smart pointers (shared_ptr) as one of
> the hot spot. Now, I decided to actually measure this..Here is my findings
> from a sample application and using jemalloc.
>
> 1.  First, I measured the performance difference of just creation and deletion
> of various pointers..Here is the result..
> 
> [..]
> int main()
> {
>    struct timeval start, end;
>    long secs_used,micros_used;
>     printf("##### Test conventional ptr ######\n");
>     gettimeofday(&start, NULL);

Don't use gettimeofday, because it returns current wall time, which may be adjusted any second by ntpd. Use clock_gettime with CLOCK_MONOTONIC or CLOCK_MONOTONIC_RAW.

>     for (uint64_t i = 0; i < 1000000000; i++) {
>       Foo* f = new Foo();
>       int xxx = f->xx;
>       long yyy = f->yy;
>       delete f;
>     }
>     gettimeofday(&end, NULL);

>     printf("start: %d secs, %d usecs\n",start.tv_sec,start.tv_usec);
>     printf("end: %d secs, %d usecs\n",end.tv_sec,end.tv_usec);
>     secs_used=(end.tv_sec - start.tv_sec); //avoid overflow by subtracting
> first
>     micros_used= ((secs_used*1000000) + end.tv_usec) - (start.tv_usec);
> 
>     printf("micros_used for conventional ptr: %d\n",micros_used);

You calculate *total* time which is subject to drift due to context switches, interrupts, and other stuff getting in the way. Divide that by 1000000000 (times you ran your loop) to get average time per loop, which is way more convincing. Then, count the number of actual uses of particular thing and multiply by average times to get reliaible info whether the perf gain outweighs reasons why shared_ptrs and unique_ptrs are used in the first place.

I actually redid your testing and I can confirm that creation and freeing of shared_ptrs is >3x slower than simple or unique_ptrs on g++ -std=c++11 -O2:

##### Test conventional ptr ######         
nanoseconds used by conventional ptr: 66.33
##### Test Unique Smart ptr ######         
nanoseconds used by unique_ptr: 66.94      
##### Test Shared Smart ptr ######         
nanoseconds used by shared_ptr: 202.22     

Also, accessing data through those pointers costs basically the same.
But are we creating and destroying that many shared_ptrs for this to become actual problem? We may gain performance, but at (high) cost of fighting memory leaks and dangling pointers. I'm not sure if it's worth it in long run.

With best regards / Pozdrawiam
Piotr Dałek

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html