cephfs mds issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I did a simple os update and reboot.  Now mds is stuck in replay.  I'm
running octapus

debug mds = 20 shows some pretty lame logs

# tail -f ceph-mds.bridge.log
2021-05-11T18:24:04.859-0700 7f41314a1700 20 mds.0.cache upkeep thread
waiting interval 1s
2021-05-11T18:24:05.860-0700 7f41314a1700 10 mds.0.cache cache not ready
for trimming
2021-05-11T18:24:05.860-0700 7f41314a1700 20 mds.0.cache upkeep thread
waiting interval 1s
2021-05-11T18:24:06.859-0700 7f4133ca6700 20 mds.0.2898629 get_task_status
2021-05-11T18:24:06.859-0700 7f4133ca6700 20 mds.0.2898629
send_task_status: updating 1 status keys
2021-05-11T18:24:06.859-0700 7f4133ca6700 20 mds.0.2898629
schedule_update_timer_task
2021-05-11T18:24:06.859-0700 7f41314a1700 10 mds.0.cache cache not ready
for trimming
2021-05-11T18:24:06.859-0700 7f41314a1700 20 mds.0.cache upkeep thread
waiting interval 1s
2021-05-11T18:24:07.859-0700 7f41314a1700 10 mds.0.cache cache not ready
for trimming
2021-05-11T18:24:07.859-0700 7f41314a1700 20 mds.0.cache upkeep thread
waiting interval 1s


# cephfs-journal-tool event recover_dentries summary
gets stuck on an object and stays stuck.  I tried to run rados -p
cephfs_metadata_pool rmomapkey per https://tracker.ceph.com/issues/38452
but the cmd ran for hours and never completes.


# cephfs-journal-tool --rank cephfs:0 journal reset
2021-05-11T18:31:26.860-0700 7f2e9c2a9700 -1 NetHandler create_socket
couldn't create socket (97) Address family not supported by protocol
2021-05-11T18:31:26.860-0700 7f2f2989ba80  4 waiting for MDS map...
2021-05-11T18:31:26.860-0700 7f2f2989ba80  4 Got MDS map 2898629
2021-05-11T18:31:26.861-0700 7f2f2989ba80 10 main: JournalTool::main
2021-05-11T18:31:26.861-0700 7f2f2989ba80  4 main: JournalTool: connecting
to RADOS...
2021-05-11T18:31:26.863-0700 7f2f2989ba80  4 main: JournalTool: resolving
pool 1
2021-05-11T18:31:26.863-0700 7f2f2989ba80  4 main: JournalTool: creating
IoCtx..
2021-05-11T18:31:26.863-0700 7f2f2989ba80  4 main: Executing for rank 0
2021-05-11T18:31:26.864-0700 7f2edc2aa700 -1 NetHandler create_socket
couldn't create socket (97) Address family not supported by protocol
2021-05-11T18:31:26.864-0700 7f2f2989ba80  4 waiting for MDS map...
2021-05-11T18:31:26.865-0700 7f2f2989ba80  4 Got MDS map 2898629
2021-05-11T18:31:26.865-0700 7f2f2989ba80  4 client.2024650.journalpointer
Reading journal pointer '400.00000000'
2021-05-11T18:31:26.865-0700 7f2f2989ba80  1
client.2024650.journaler.resetter(ro) recover start
2021-05-11T18:31:26.865-0700 7f2f2989ba80  1
client.2024650.journaler.resetter(ro) read_head
2021-05-11T18:31:26.865-0700 7f291c293700  1
client.2024650.journaler.resetter(ro) _finish_read_head loghead(trim
14172553216, expire 14174788378, write 14400838791, stream_format 1).
 probing for end of log (from 14400838791)...
2021-05-11T18:31:26.865-0700 7f291c293700  1
client.2024650.journaler.resetter(ro) probing for end of the log

I've been stuck here for hours


# strace -f -p 10357
[pid 10360] <... sendmsg resumed>)      = 9
[pid 10361] read(14,  <unfinished ...>
[pid 10360] epoll_wait(7,  <unfinished ...>
[pid 10361] <... read resumed>0x55e95d982000, 4096) = -1 EAGAIN (Resource
temporarily unavailable)
[pid 10360] <... epoll_wait resumed>[{EPOLLIN, {u32=16, u64=16}}, {EPOLLIN,
{u32=18, u64=18}}], 5000, 30000) = 2
[pid 10361] epoll_wait(10,  <unfinished ...>
[pid 10360] read(16,
"\23\1\10\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\354^\340;"...,
4096) = 57
[pid 10360] read(16, 0x55e95d9a8000, 4096) = -1 EAGAIN (Resource
temporarily unavailable)
[pid 10360] read(18, "\17\264R\233`\327\275\222+", 4096) = 9
[pid 10360] read(18, 0x55e95d9f4000, 4096) = -1 EAGAIN (Resource
temporarily unavailable)
[pid 10360] epoll_wait(7, ^X <unfinished ...>
[pid 10370] <... futex resumed>)        = -1 ETIMEDOUT (Connection timed
out)
[pid 10381] <... futex resumed>)        = -1 ETIMEDOUT (Connection timed
out)
[pid 10370] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
[pid 10389] <... futex resumed>)        = -1 ETIMEDOUT (Connection timed
out)
[pid 10381] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
[pid 10370] <... clock_gettime resumed>{tv_sec=1620791989,
tv_nsec=731038214}) = 0
[pid 10389] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
[pid 10381] <... clock_gettime resumed>{tv_sec=1620791989,
tv_nsec=731105584}) = 0
[pid 10389] <... clock_gettime resumed>{tv_sec=1620791989,
tv_nsec=731125991}) = 0
[pid 10370] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
[pid 10381] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
[pid 10389] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
[pid 10370] <... clock_gettime resumed>{tv_sec=1620791989,
tv_nsec=731162065}) = 0
[pid 10389] <... clock_gettime resumed>{tv_sec=1620791989,
tv_nsec=731184311}) = 0
[pid 10381] <... clock_gettime resumed>{tv_sec=1620791989,
tv_nsec=731174345}) = 0
[pid 10370] futex(0x55e95d97c2d8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 10381] futex(0x55e95d8a5320, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 10370] <... futex resumed>)        = 0
[pid 10389] futex(0x55e95d97fad8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 10381] <... futex resumed>)        = 0
[pid 10370] futex(0x55e95d97c31c,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 17805, {tv_sec=1620791990,
tv_nsec=731161399}, 0xffffffff <unfinished ...>
[pid 10389] <... futex resumed>)        = 0
[pid 10381] futex(0x55e95d8a5364,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 17805, {tv_sec=1620791990,
tv_nsec=731173986}, 0xffffffff <unfinished ...>
[pid 10389] futex(0x55e95d97fb1c,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 17805, {tv_sec=1620791990,
tv_nsec=731183618}, 0xffffffff^Cstrace: Process 10357 detached


Any help would be great.

Thanks,
/C
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux