Re: osd stops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ah. It looks like you're running btrfs and you have a very full disk. Unfortunately btrfs doesn't handle low-disk situations (above ~80% utilization -- yes, it's annoying) very well and so it's failing to perform pretty basic tasks and is propagating those failures up to the OSD. If you really need to run that close to full utilization you're going to need to use another underlying filesystem, or add more disks/nodes to spread the data across.
Sorry. :(

-Greg
On Tuesday, April 12, 2011 at 9:26 AM, Martin Wilderoth wrote:
I have been done some tests and it seems as I always get the same problem.
> I have been transfering data and suddenly I get I/O error and superblock problem.
> This occurs when the filesystem is filled to aprox 80%
> 
> ceph health reports no error. I restart the system -a stop -a start
> after that the system is degraded and the osd stopes.
> 
> The log shows of the fist failing osd
> 
> 2011-04-12 17:51:07.716513 7f02365b8700 -- 0.0.0.0:6802/20180 >> 10.0.6.12:6802/13633 pipe(0x2e1da00 sd=22 pgs=0 cs=0 l=0).fault first fault
> 2011-04-12 17:51:07.716868 7f02365b8700 -- 0.0.0.0:6802/20180 >> 10.0.6.12:6802/13633 pipe(0x2e1da00 sd=22 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/15976 not 10.0.6.12:6802/13633 - wrong node!
> os/FileStore.cc: In function 'void FileStore::sync_entry()', in thread '0x7f023f9ce700'
> os/FileStore.cc: 2674: FAILED assert(r == 0)
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>  1: (FileStore::sync_entry()+0x1975) [0x59f165]
>  2: (FileStore::SyncThread::entry()+0xd) [0x5a8a7d]
>  3: (()+0x68ba) [0x7f024602b8ba]
>  4: (clone()+0x6d) [0x7f0244cc002d]
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>  1: (FileStore::sync_entry()+0x1975) [0x59f165]
>  2: (FileStore::SyncThread::entry()+0xd) [0x5a8a7d]
>  3: (()+0x68ba) [0x7f024602b8ba]
>  4: (clone()+0x6d) [0x7f0244cc002d]
> *** Caught signal (Aborted) **
>  in thread 0x7f023f9ce700
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>  1: /usr/bin/cosd() [0x61e42c]
>  2: (()+0xef60) [0x7f0246033f60]
>  3: (gsignal()+0x35) [0x7f0244c23165]
>  4: (abort()+0x180) [0x7f0244c25f70]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f02454b6dc5]
>  6: (()+0xcb166) [0x7f02454b5166]
>  7: (()+0xcb193) [0x7f02454b5193]
>  8: (()+0xcb28e) [0x7f02454b528e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x373) [0x6061e3]
>  10: (FileStore::sync_entry()+0x1975) [0x59f165]
>  11: (FileStore::SyncThread::entry()+0xd) [0x5a8a7d]
>  12: (()+0x68ba) [0x7f024602b8ba]
>  13: (clone()+0x6d) [0x7f0244cc002d]
> 
> the second failing osd
> 
> 2011-04-12 18:03:36.036420 7f39c6ce7700 FileStore: sync_entry timed out after 600 seconds.
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
> 2011-04-12 18:03:36.036494 1: (SafeTimer::timer_thread()+0x36b) [0x601afb]
> 2011-04-12 18:03:36.036509 2: (SafeTimerThread::entry()+0xd) [0x6042cd]
> 2011-04-12 18:03:36.036528 3: (()+0x68ba) [0x7f39d034a8ba]
> 2011-04-12 18:03:36.036541 4: (clone()+0x6d) [0x7f39cefdf02d]
> 2011-04-12 18:03:36.036551 os/FileStore.cc: In function 'virtual void SyncEntryTimeout::finish(int)', in thread '0x7f39c6ce7700'
> os/FileStore.cc: 2573: FAILED assert(0)
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>  1: (SyncEntryTimeout::finish(int)+0xf4) [0x5a0b34]
>  2: (SafeTimer::timer_thread()+0x36b) [0x601afb]
>  3: (SafeTimerThread::entry()+0xd) [0x6042cd]
>  4: (()+0x68ba) [0x7f39d034a8ba]
>  5: (clone()+0x6d) [0x7f39cefdf02d]
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>  1: (SyncEntryTimeout::finish(int)+0xf4) [0x5a0b34]
>  2: (SafeTimer::timer_thread()+0x36b) [0x601afb]
>  3: (SafeTimerThread::entry()+0xd) [0x6042cd]
>  4: (()+0x68ba) [0x7f39d034a8ba]
>  5: (clone()+0x6d) [0x7f39cefdf02d]
> *** Caught signal (Aborted) **
>  in thread 0x7f39c6ce7700
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>  1: /usr/bin/cosd() [0x61e42c]
>  2: (()+0xef60) [0x7f39d0352f60]
>  3: (gsignal()+0x35) [0x7f39cef42165]
>  4: (abort()+0x180) [0x7f39cef44f70]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39cf7d5dc5]
>  6: (()+0xcb166) [0x7f39cf7d4166]
>  7: (()+0xcb193) [0x7f39cf7d4193]
>  8: (()+0xcb28e) [0x7f39cf7d428e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x373) [0x6061e3]
>  10: (SyncEntryTimeout::finish(int)+0xf4) [0x5a0b34]
>  11: (SafeTimer::timer_thread()+0x36b) [0x601afb]
>  12: (SafeTimerThread::entry()+0xd) [0x6042cd]
>  13: (()+0x68ba) [0x7f39d034a8ba]
>  14: (clone()+0x6d) [0x7f39cefdf02d]
> 
> regards Martin
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux