That's a really huge leak. >From my experience a 'gluster volume status <volname> fd/inode' commands that could lead to huge rpc responses, when done on a volume with many files being accessed. I also vaguely remember having some problems with memory due to rpc saved_frames. IIRC the saved_frames continued to grow due some connection problems (or something) which lead to a large number of concurrent rpc requests being active. cc'ing krishnan to see if he remembers anything more. On Wed, Nov 12, 2014 at 6:56 PM, Jaden Liang <jaden1q84@xxxxxxxxx> wrote: > Hi all, > > I am running gluster-3.4.5 on 2 servers. Each of them has 7 2TB HDDs to > build a 7 * 2 distributed + replicated volume. > I just notice that the glusterd consume about 120GB memory and get a > coredump today. I read the mempool code try to identify which mempool eat > the memory. Unfortunetly, the glusterd did not run with --mem-accounting. > Now I just have a coredump file to debug... Anyway, I read some codes about > mem_pool try to identify which mem_pool consumes such large memory. Here is > the result: > > I wrote a gdb script to print out the glusterfsd_ctx->mempool_list: > > # script of gdb to print out all none-zero mem_pool > set $head = &glusterfsd_ctx->mempool_list > set $offset = (unsigned long)(&((struct mem_pool*)0)->global_list) > set $pos = (struct mem_pool*)((unsigned long)($head->next) - $offset) > set $memsum = 0 > while ( &$pos->global_list != $head) > if ($pos->hot_count + $pos->curr_stdalloc) > p *$pos > set $thismempoolsize = ($pos->hot_count + $pos->curr_stdalloc) * > $pos->padded_sizeof_type > # This is the single mem_pool memory consume > p $pos->name > p $thismempoolsize > set $memsum += $thismempoolsize > end > set $pos = (struct mem_pool*)((unsigned long)($pos->global_list.next) - > $offset) > end > echo "Total mem used\n" > p $memsum > > Then I got this output: > > (gdb) source gdb_show_mempool_list.gdb > $459 = {list = {next = 0x1625a50, prev = 0x1625a50}, hot_count = 64, > cold_count = 0, lock = 1, padded_sizeof_type = 6116, pool = 0x7ff2c9f94010, > pool_end = 0x7ff2c9ff3910, real_sizeof_type = 6088, > alloc_count = 16919588, pool_misses = 16919096, max_alloc = 64, > curr_stdalloc = 16824653, max_stdalloc = 16824655, name = 0x1625ad0 > "management:rpcsvc_request_t", global_list = {next = 0x16211f8, > prev = 0x1639368}} > $460 = 0x1625ad0 "management:rpcsvc_request_t" > $461 = 102899969172 > $462 = {list = {next = 0x7ff2cc0bf374, prev = 0x7ff2cc0bc2b4}, hot_count = > 16352, cold_count = 32, lock = 1, padded_sizeof_type = 52, pool = > 0x7ff2cc0bc010, pool_end = 0x7ff2cc18c010, > real_sizeof_type = 24, alloc_count = 169845909, pool_misses = 168448980, > max_alloc = 16384, curr_stdalloc = 168231365, max_stdalloc = 168231560, name > = 0x1621210 "glusterfs:data_t", global_list = { > next = 0x1621158, prev = 0x1625ab8}} > $463 = 0x1621210 "glusterfs:data_t" > $464 = 8748881284 > $465 = {list = {next = 0x7ff2cc18e770, prev = 0x7ff2cc18d2fc}, hot_count = > 16350, cold_count = 34, lock = 1, padded_sizeof_type = 68, pool = > 0x7ff2cc18d010, pool_end = 0x7ff2cc29d010, > real_sizeof_type = 40, alloc_count = 152853817, pool_misses = 151477891, > max_alloc = 16384, curr_stdalloc = 151406417, max_stdalloc = 151406601, name > = 0x1621170 "glusterfs:data_pair_t", > global_list = {next = 0x16210b8, prev = 0x16211f8}} > $466 = 0x1621170 "glusterfs:data_pair_t" > $467 = 10296748156 > $468 = {list = {next = 0x1621050, prev = 0x1621050}, hot_count = 4096, > cold_count = 0, lock = 1, padded_sizeof_type = 140, pool = 0x7ff2cc29e010, > pool_end = 0x7ff2cc32a010, real_sizeof_type = 112, > alloc_count = 16995288, pool_misses = 16986651, max_alloc = 4096, > curr_stdalloc = 16820855, max_stdalloc = 16820882, name = 0x16210d0 > "glusterfs:dict_t", global_list = {next = 0x1621018, > prev = 0x1621158}} > $469 = 0x16210d0 "glusterfs:dict_t" > $470 = 2355493140 > "Total mem used > "$471 = 124301091752 > > -------------------------------------------------------------------------------------- > "management:rpcsvc_request_t" used 100G > "glusterfs:data_t" used 8.7GB > "glusterfs:data_pair_t" used 10GB > "glusterfs:dict_t" use 2.3G > Total: 124GB memory > > --------------------------------------------------------------------------------------- > I assume this might happen in a lot of rpc request and not free. > This happened several days ago, I am still trying to figure out what happen > several days ago on my servers. > Hope someone here might encountered this issue before, or any advices will > be grateful!!! > > > > > > > > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel