On May 15, 2014, at 6:06 PM, Cao, Buddy <buddy.cao at intel.com> wrote: > Hi, > > One of the osd in my cluster downs w no reason, I saw the error message in the log below, I restarted osd, but after several hours, the problem come back again. Could you help? > > ?Too many open files not handled on operation 24 (541468.0.1, or op 1, counting from 0) It looks like you are running out of FD from the above error message. You can check the limit by ?bash-$: ulimit -a?, and how many are being used by ?bash-$: cat /proc/sys/fs/file-nr?, if they are close, it is likely you are at risk of running out of FD with load (or other cluster wide activities). > -96> 2014-05-14 22:12:24.281185 7f617b33e700 5 -- op tracker -- , seq: 788808, time: 2014-05-14 22:12:24.281164, event: reached_pg, request: osd_op(client.21276.0:3884815 rb.0.31c7.238e1f 29.000000003c15 [write 2273280~65536] 4.110fcf4 e12271) v4 > -95> 2014-05-14 22:12:24.281192 7f618556d700 0 filestore(/var/lib/ceph/osd/ceph-3) unexpected error code > -94> 2014-05-14 22:12:24.281197 7f6181b4b700 5 -- op tracker -- , seq: 788843, time: 2014-05-14 22:12:24.281011, event: header_read, request: > osd_op(client.21276.0:3884929 rb.0.31c7.238e1 f29.000000005614 [write 3137536~65536] 4.63e147e e12271) v4 > > 2014-05-14 22:12:24.289987 7f6185d6e700 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thre ad 7f6185d6e700 time 2014-05-14 22:12:24.282488 > os/FileStore.cc: 2448: FAILED assert(0 == "unexpected error") > ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) > 1: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0x11c3) [0x723a43] > 2: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x74) [0x72a4d4] > 3: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x29a) [0x72a78a] > 4: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0x988f21] > 5: (ThreadPool::WorkThread::entry()+0x10) [0x98bf50] > 6: /lib64/libpthread.so.0() [0x3a7ce079d1] > 7: (clone()+0x6d) [0x3a7cae8b6d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.???.. > > > #iostat > avg-cpu: %user %nice %system %iowait %steal %idle > 0.44 0.00 0.14 0.41 0.00 99.01 > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sdb 1.23 0.10 35.72 12738 4762008 > sdc 5.25 214.25 1288.81 28564314 171824232 > sdd 4.16 139.98 1021.69 18662490 136211888 > sde 4.61 207.50 1039.20 27663258 138545960 > sdf 7.94 203.24 2530.63 27095930 337383704 > sdg 4.77 0.57 1459.29 75330 194553064 > sdh 4.38 0.37 1287.42 48954 171638304 > sdi 85.80 132.13 8157.53 17616004 1087562272 > sdj 8.77 10.99 1701.90 1465844 226897024 > sda 4.55 0.60 1331.50 80010 177516216 > > > osd log attached. > > Wei Cao (Buddy) > > <ceph-osd.21_short.log>_______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140516/cbd068bc/attachment.htm>