syslog problems

Sam Lang <samlang@xxxxxxxxx> · Wed, 15 Jun 2011 09:26:42 -0500

In my ceph setup, I had logs being written to the default location 
(/var/log/ceph/) and eventually would get monitor or osd crashes because 
the disk would fill up with logs.  So I started writing the logs to 
syslog, and now the local disk doesn't fill up, but I still get similar 
errors to those of before.  For example:

Jun 15 08:58:24 lut-ceph02 mon.beta[6739]: *** Caught signal (Aborted) 
**#012 in thread 0x7f79b1b62700
Jun 15 08:58:24 lut-ceph02 mon.beta[6739]:  ceph version  (commit:)#012 
1: /usr/ceph/bin/cmon() [0x5a1d69]#012 2: (()+0xfc60) 
[0x7f79b461bc60]#012 3: (gsignal()+0x35) [0x7f79b3b0ad05]#012 4: 
(abort()+0x186) [0x7f79b3b0eab6]#012 5: 
(__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f79b43c16dd]#012 6: 
(()+0xb9926) [0x7f79b43bf926]#012 7: (()+0xb9953) [0x7f79b43bf953]#012 
8: (()+0xb9a5e) [0x7f79b43bfa5e]#012 9: (ceph::__ceph_assert_fail(char 
const*, char const*, int, char const*)+0x362) [0x57d252]#012 10: 
(MonitorStore::write_bl_ss(ceph::buffer::list&, char const*, char 
const*, bool, bool)+0x2bd) [0x510dcd]#012 11: 
(LogMonitor::update_from_paxos()+0x2547) [0x4f26f7]#012 12: 
(Monitor::_ms_dispatch(Message*)+0xd8c) [0x480afc]#012 13: 
(Monitor::ms_dispatch(Message*)+0x79) [0x48a2f9]#012 14: 
(SimpleMessenger::dispatch_entry()+0x667) [0x46b157]#012 15: 
(SimpleMessenger::DispatchThread::entry()+0x1c) [0x456c5c]#012 16: 
(()+0x6d8c) [0x7f79b4612d8c]#012 17: (clone()+0x6d) [0x7f79b3bbd04d]

Also, I've seen a monitor process get killed by the OOM killer (see 
below).  Are these known issues?  In practice, do folks just disable all 
logging right now and hope for the best?

Thanks,
-sam

OOM killer messages:

[364540.080818] Node 0 DMA free:7992kB min:348kB low:432kB high:520kB 
active_anon:3632kB inactive
_anon:3744kB active_file:0kB inactive_file:0kB unevictable:0kB 
isolated(anon):0kB isolated(file):
0kB present:15664kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB 
shmem:0kB slab_reclaimable:56k
B slab_unreclaimable:384kB kernel_stack:64kB pagetables:16kB 
unstable:0kB bounce:0kB writeback_tm
p:0kB pages_scanned:0 all_unreclaimable? yes
[364540.080831] lowmem_reserve[]: 0 1963 1963 1963
[364540.080837] Node 0 DMA32 free:44672kB min:44704kB low:55880kB 
high:67056kB active_anon:138704
4kB inactive_anon:462364kB active_file:520kB inactive_file:2780kB 
unevictable:0kB isolated(anon):
0kB isolated(file):0kB present:2010592kB mlocked:0kB dirty:4kB 
writeback:8kB mapped:396kB shmem:5
920kB slab_reclaimable:7928kB slab_unreclaimable:18600kB 
kernel_stack:7560kB pagetables:11992kB u
nstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2860 
all_unreclaimable? yes
[364540.080850] lowmem_reserve[]: 0 0 0 0
[364540.080856] Node 0 DMA: 14*4kB 14*8kB 7*16kB 3*32kB 3*64kB 2*128kB 
4*256kB 4*512kB 2*1024kB 1
*2048kB 0*4096kB = 7992kB
[364540.080869] Node 0 DMA32: 9632*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
0*128kB 0*256kB 0*512kB 0*1024k
B 1*2048kB 1*4096kB = 44672kB
[364540.080882] 4870 total pagecache pages
[364540.080884] 2596 pages in swap cache
[364540.080887] Swap cache stats: add 501808, delete 499212, find 6734/8216
[364540.080889] Free swap  = 0kB
[364540.080891] Total swap = 1953040kB
[364540.087324] 513696 pages RAM
[364540.087327] 10085 pages reserved
[364540.087329] 1907 pages shared
[364540.087330] 487503 pages non-shared
...
[364540.087475] [ 5998]     0  5975   851179   444043   0       
0             0 cmon
[364540.087480] [ 6188]     0  6188    49938      145   1       
0             0 cmds
[364540.087484] [ 6396]     0  6396   150647     1359   1       
0             0 cosd
[364540.087489] [ 6485]     0  6485   176420     7324   0       
0             0 cosd
[364540.087494] [ 7076]     0  7076   168333     1561   1       
0             0 cosd
[364540.087499] [ 7660]     0  7660   167456     1571   1       
0             0 cosd
[364540.087503] [ 7747]     0  7747   149214     1497   0       
0             0 cosd
[364540.087515] Out of memory: Kill process 5998 (cmon) score 776 or 
sacrifice child
[364540.087570] Killed process 5998 (cmon) total-vm:3404716kB, 
anon-rss:1776172kB, file-rss:0kB

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html