On Fri, Aug 31, 2018 at 6:11 AM morfair@xxxxxxxxx <morfair@xxxxxxxxx> wrote: > > Hello all! > > I had a electric power problem. After this I have 2 incomplete pg. But all RBD volumes are work. > > But not work my CephFS. MDS load stop at "replay" state and MDS related commands hangs: > > cephfs-journal-tool journal export backup.bin - freeze; > > cephfs-journal-tool event recover_dentries summary - freeze (no action in strace); > > cephfs-journal-tool journal reset - freeze; > > As you have noticed, you have two incomplete PGs. They are presumably metadata PGs, and CephFS can't read or write some parts of its metadata, so it's IOs are blocking. You need to investigate what's going on with those PGs, and longer term work out what about your configuration allowed an electrical problem to damage the cluster -- look into your drive controller configuration (do you have e.g. writeback caches without battery backup?) etc. John > strace out: > > <unfinished ...> > [pid 6314] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) > [pid 6314] futex(0x55d342eea928, FUTEX_WAKE_PRIVATE, 1) = 0 > [pid 6314] futex(0x55d342eea97c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 31, {1535692208, 64139948}, ffffffff <unfinished ...> > [pid 6318] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) > [pid 6318] futex(0x55d3430b6958, FUTEX_WAKE_PRIVATE, 1) = 0 > [pid 6318] futex(0x55d3430b6984, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 33, {1535692208, 80954445}, ffffffff <unfinished ...> > [pid 6324] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) > [pid 6324] futex(0x55d3430b7758, FUTEX_WAKE_PRIVATE, 1) = 0 > [pid 6324] write(12, "c", 1) = 1 > [pid 6317] <... epoll_wait resumed> {{EPOLLIN, {u32=11, u64=11}}}, 5000, 30000) = 1 > [pid 6317] read(11, "c", 256) = 1 > [pid 6317] read(11, 0x7f1558c32300, 256) = -1 EAGAIN (Resource temporarily unavailable) > [pid 6317] futex(0x55d3432269e0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> > [pid 6324] futex(0x55d3432269e0, FUTEX_WAKE_PRIVATE, 1) = 1 > [pid 6317] <... futex resumed> ) = 0 > [pid 6317] futex(0x55d3432269e0, FUTEX_WAKE_PRIVATE, 1) = 0 > [pid 6317] sendmsg(17, {msg_name(0)=NULL, msg_iov(1)=[{"\7\25\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\177\0\1\0\0\0\0\0\0\0\0\0\0"..., 75}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL <unfinished ...> > [pid 6324] futex(0x55d3430b7784, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 33, {1535692208, 169222622}, ffffffff <unfinished ...> > [pid 6317] <... sendmsg resumed> ) = 75 > [pid 6317] epoll_wait(10, > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com