Re: MDS not start. Timeout??

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 31, 2018 at 6:11 AM morfair@xxxxxxxxx <morfair@xxxxxxxxx> wrote:
>
> Hello all!
>
> I had a electric power problem. After this I have 2 incomplete pg. But all RBD volumes are work.
>
> But not work my CephFS. MDS load stop at "replay" state and MDS related commands hangs:
>
> cephfs-journal-tool journal export backup.bin - freeze;
>
> cephfs-journal-tool event recover_dentries summary - freeze (no action in strace);
>
> cephfs-journal-tool journal reset - freeze;
>
>

As you have noticed, you have two incomplete PGs.  They are presumably
metadata PGs, and CephFS can't read or write some parts of its
metadata, so it's IOs are blocking.

You need to investigate what's going on with those PGs, and longer
term work out what about your configuration allowed an electrical
problem to damage the cluster -- look into your drive controller
configuration (do you have e.g. writeback caches without battery
backup?) etc.

John

> strace out:
>
>  <unfinished ...>
> [pid  6314] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
> [pid  6314] futex(0x55d342eea928, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  6314] futex(0x55d342eea97c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 31, {1535692208, 64139948}, ffffffff <unfinished ...>
> [pid  6318] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
> [pid  6318] futex(0x55d3430b6958, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  6318] futex(0x55d3430b6984, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 33, {1535692208, 80954445}, ffffffff <unfinished ...>
> [pid  6324] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
> [pid  6324] futex(0x55d3430b7758, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  6324] write(12, "c", 1)           = 1
> [pid  6317] <... epoll_wait resumed> {{EPOLLIN, {u32=11, u64=11}}}, 5000, 30000) = 1
> [pid  6317] read(11, "c", 256)          = 1
> [pid  6317] read(11, 0x7f1558c32300, 256) = -1 EAGAIN (Resource temporarily unavailable)
> [pid  6317] futex(0x55d3432269e0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid  6324] futex(0x55d3432269e0, FUTEX_WAKE_PRIVATE, 1) = 1
> [pid  6317] <... futex resumed> )       = 0
> [pid  6317] futex(0x55d3432269e0, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  6317] sendmsg(17, {msg_name(0)=NULL, msg_iov(1)=[{"\7\25\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\177\0\1\0\0\0\0\0\0\0\0\0\0"..., 75}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL <unfinished ...>
> [pid  6324] futex(0x55d3430b7784, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 33, {1535692208, 169222622}, ffffffff <unfinished ...>
> [pid  6317] <... sendmsg resumed> )     = 75
> [pid  6317] epoll_wait(10,
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux