On Thursday 19 March 2009 11:06:23 Anand Avati wrote: > On Thu, Mar 19, 2009 at 2:05 PM, Andrew McGill <list2008 at lunch.za.net> wrote: > > Hi Anand, > > > > I found two core dump in /, and did a backtrace. ?In both cases, the > > error position is the abort() call in fuse-bridge.c:2583 -- > > > > 2583 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ERR_ABORT (buf->data); > > Do you know if the system was low on memory? Have you tuned the > overcommit ratio in /proc by any chance? calloc returns NULL only > under dire circumstances. > > Avati The system has 1Gb of RAM and 2GB of swap. The core dumps are about as big as the available memory: [root at backup5 /]# ls -la core.* -h -rw------- 1 root root 3.0G Mar 14 13:35 core.23239 -rw------- 1 root root 2.9G Mar 14 22:24 core.28543 The core dump tells me that it was allocating another 125.9 kb of memory at the time: Smaller coredump: (gdb) p res $3 = 128924 (gdb) p buf $4 = (data_t *) 0xa816308 (gdb) p *buf $5 = {is_static = 0 '\0', is_const = 0 '\0', len = 0, vec = 0x0, data = 0x0, refcount = 1, lock = 1} Larger coredump: (gdb) p buf $1 = (data_t *) 0xb45fc828 (gdb) p res $2 = 129708 (gdb) p *buf $3 = {is_static = 0 '\0', is_const = 0 '\0', len = 0, vec = 0x0, data = 0x0, refcount = 1, lock = 1} It appears there is no matching free outside of the loop "while (!fuse_session_exited (priv->se)) {" for both of the CALLOC calls: char *recvbuf = CALLOC (1, chan_size); ... buf->data = CALLOC (1, res); ... I kicked it around a little more, but didn't find anything earth shattering, I think ... (gdb) p *priv $5 = {fd = 8, fuse = 0x0, se = 0x8b65b18, ch = 0x8b65c60, volfile = 0x0, volfile_size = 0, mount_point = 0x8b65b00 "/mnt/glusterfs", buf = 0xb45fc828, fuse_thread = 3076221840, fuse_thread_started = 1 '\001', direct_io_mode = 1, entry_timeout = 1, attribute_timeout = 1, first_call_cond = {__data = {__lock = 0, __futex = 2, __total_seq = 1, __wakeup_seq = 1, __woken_seq = 1, __mutex = 0x8b65adc, __nwaiters = 0, __broadcast_seq = 1}, __size = "\000\000\000\000\002\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000?Z?\b\000\000\000\000\001\000\000\000\000\000\000", __align = 8589934592}, first_call_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __kind = 0, __nusers = 0, {__spins = 0, __list = {__next = 0x0}}}, __size = '\0' <repeats 23 times>, __align = 0}, first_call = 0 '\0', strict_volfile_check = _gf_true}