osd down/autoout problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On May 15, 2014, at 6:06 PM, Cao, Buddy <buddy.cao at intel.com> wrote:

> Hi,
>  
> One of the osd in my cluster downs w no reason, I saw the error message in the log below, I restarted osd, but after several hours, the problem come back again. Could you help?
>  
> ?Too many open files not handled on operation 24 (541468.0.1, or op 1, counting from 0)
It looks like you are running out of FD from the above error message.
You can check the limit by ?bash-$: ulimit -a?, and how many are being used by ?bash-$: cat /proc/sys/fs/file-nr?, if they are close, it is likely you are at risk of running out of FD with load (or other cluster wide activities).
>    -96> 2014-05-14 22:12:24.281185 7f617b33e700  5 -- op tracker -- , seq: 788808, time: 2014-05-14 22:12:24.281164, event: reached_pg, request:  osd_op(client.21276.0:3884815 rb.0.31c7.238e1f 29.000000003c15 [write 2273280~65536] 4.110fcf4 e12271) v4
>   -95> 2014-05-14 22:12:24.281192 7f618556d700  0 filestore(/var/lib/ceph/osd/ceph-3) unexpected error code
>    -94> 2014-05-14 22:12:24.281197 7f6181b4b700  5 -- op tracker -- , seq: 788843, time: 2014-05-14 22:12:24.281011, event: header_read, request:
> osd_op(client.21276.0:3884929 rb.0.31c7.238e1 f29.000000005614 [write 3137536~65536] 4.63e147e e12271) v4
> > 2014-05-14 22:12:24.289987 7f6185d6e700 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thre ad 7f6185d6e700 time 2014-05-14 22:12:24.282488
> os/FileStore.cc: 2448: FAILED assert(0 == "unexpected error")
>  ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
> 1: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0x11c3) [0x723a43]
> 2: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x74) [0x72a4d4]
> 3: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x29a) [0x72a78a]
> 4: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0x988f21]
> 5: (ThreadPool::WorkThread::entry()+0x10) [0x98bf50]
> 6: /lib64/libpthread.so.0() [0x3a7ce079d1]
> 7: (clone()+0x6d) [0x3a7cae8b6d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.???..
>  
>  
> #iostat
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.44    0.00    0.14    0.41    0.00   99.01
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sdb               1.23         0.10        35.72      12738    4762008
> sdc               5.25       214.25      1288.81   28564314  171824232
> sdd               4.16       139.98      1021.69   18662490  136211888
> sde               4.61       207.50      1039.20   27663258  138545960
> sdf               7.94       203.24      2530.63   27095930  337383704
> sdg               4.77         0.57      1459.29      75330  194553064
> sdh               4.38         0.37      1287.42      48954  171638304
> sdi              85.80       132.13      8157.53   17616004 1087562272
> sdj               8.77        10.99      1701.90    1465844  226897024
> sda               4.55         0.60      1331.50      80010  177516216
>  
>  
> osd log attached.
>  
> Wei Cao (Buddy)
>  
> <ceph-osd.21_short.log>_______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140516/cbd068bc/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux