Re: Probable memory leak in Hammer write path ?

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Thu, 2 Jul 2015 08:36:22 +0200 (CEST)

>>So, this could be another step ahead of removing tcmalloc as Ceph's default allocator and moving to jemalloc.

+1 for this.

I begin to be worried by all theses tcmalloc problems.

----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx>
À: "Gregory Farnum" <greg@xxxxxxxxxxx>
Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>
Envoyé: Jeudi 2 Juillet 2015 05:38:24
Objet: RE: Probable memory leak in Hammer write path ?

Hi, 
I think I am eventually able to figure out what is happening.. First, here is the step to reproduce it in any kernel (not specific to 3.16/3.18 as I said earlier) 

1. In a SSD create a big journal partition say > 20 GB or so. 

2. Make all the filestore/journal parameter default other than the following. Set these value in your conf file. This is just to make sure journal writes are not throttled and going far ahead than backend writes. 

filestore_queue_max_ops = 5000000 
filestore_queue_max_bytes = 1000000000000 
filestore_queue_committing_max_ops = 5000000 
filestore_queue_committing_max_bytes = 1000000000000 

3. Run any release say Firefly/Giant/Hammer and create a single OSD cluster giving rest of the SSD as data partition. 

4. Run say fio_rbd random write workload as 16K, QD 64 , num_jobs=8 

5. run 'dstat -m' and see how used memory is rising ! 

Now, this behavior I found with glibcmalloc/tcmalloc/jemalloc and that I communicated earlier. But, I didn't wait enough earlier to see if memory is coming down when IO stopped :-) 
I saw in case of tcmalloc it is not coming down , *but* in case of jemalloc it is *coming down* the following way to the old place. I didn't go back to glibcmalloc again, but, it should be releasing also.. 

1. If I stop IO, journal write stops, but backend flash is catching up and the memory is coming down accordingly. This is expected because all the transactions are piling up in workQ while journal is way ahead, but, the moment it is processed the transactions are deleted and memory usage coming down. 

2. After journal is full, it started throttling and the overall IO rate is coming down. The backend flash now has the opportunity to catch up and thus releasing the memory. 

None of the above is happening in case of tcmalloc. There it is not releasing the memory at all..Digging down some of the doc I found out this can happen and there is a flag to control the release rate..But, no luck after changing that as well..Didn't invest much time on that though. 
What I saw next time when I ran IO, in case of tcmalloc the memory is not rising at the beginning, probably, it is reusing those memory and started rising again after some time...But, I doubt this behavior is good.. 

So, this could be another step ahead of removing tcmalloc as Ceph's default allocator and moving to jemalloc. 

Thanks Greg for asking me to relook at tcmalloc otherwise I was kind of out of option :-).. 

Regards 
Somnath 

-----Original Message----- 
From: Somnath Roy 
Sent: Wednesday, July 01, 2015 4:58 PM 
To: 'Gregory Farnum' 
Cc: ceph-devel@xxxxxxxxxxxxxxx 
Subject: RE: Probable memory leak in Hammer write path ? 

Thanks Greg! 
Yeah, I will double check..But, I built the code without tcmalloc (with glibc) and it was also showing the similar behavior. 

Thanks & Regards 
Somnath 

-----Original Message----- 
From: Gregory Farnum [mailto:greg@xxxxxxxxxxx] 
Sent: Wednesday, July 01, 2015 9:07 AM 
To: Somnath Roy 
Cc: ceph-devel@xxxxxxxxxxxxxxx 
Subject: Re: Probable memory leak in Hammer write path ? 

On Mon, Jun 29, 2015 at 4:39 PM, Somnath Roy <Somnath.Roy@xxxxxxxxxxx> wrote: 
> Greg, 
> Updating to the new kernel updating the gcc version too. Recent kernel is changing tcmalloc version too, but, 3.16 has old tcmalloc but still exhibiting the issue. 
> Yes, the behavior is very confusing and compiler is main variable I could think of from application perspective. 
> If you have a 3.16/3.19 kernel, you could reproduce this following these steps. 
> 
> 1. Build ceph-hammer code base 
> 
> 2. Run with single OSD. 
> 
> 3. Create an image and run a fio-bed workload from client (say 16K bs, 
> 8 num_jobs) 
> 
> 4. run 'dstat -m' and observe the memory usage. 
> 
> What I am thinking of doing is to install ceph from ceph.com and see the behavior. 

In addition to that, I'd look for if there are any known bugs in the tcmalloc version you're using on the leaky systems, and check the tcmalloc stats to see if they have a bunch of free memory which hasn't been released to the OS yet. 
-Greg 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 

N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+�����ݢj"��!�i 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html