Failed to read JournalPointer - MDS error (mds rank 0 is damaged)

Martin B Nielsen <martin@xxxxxxxxxxx> · Sat, 29 Apr 2017 13:15:02 +0200

Hi,

We're using ceph 10.2.5 and cephfs.

We had a weird monitor (mon0r0) which had some sort of meltdown as current active mds node.

The monitor node called elections on/off over ~1 hour, sometimes with 5-10min between.

On every occasion mds was also doing a replay, reconnect, rejoin => active (it never switched to use a standby mds).

Then after 1 hour of it mostly working it gave up with:

[ ... ]
2017-04-29 07:30:24.444980 7fe6d7e9c700  0 mds.beacon.mon0r0 handle_mds_beacon no longer laggy
2017-04-29 07:30:46.783817 7fe6d7e9c700  0 monclient: hunting for new mon
< bunch of errors like this >
2017-04-29 07:31:11.782049 7fe6d7e9c700  1 mds.mon0r0 handle_mds_map i (172.16.130.10:6811/8235) dne in the mdsmap, respawning myself
2017-04-29 07:31:11.782054 7fe6d7e9c700  1 mds.mon0r0 respawn
2017-04-29 07:31:11.782056 7fe6d7e9c700  1 mds.mon0r0  e: '/usr/bin/ceph-mds'
2017-04-29 07:31:11.782058 7fe6d7e9c700  1 mds.mon0r0  0: '/usr/bin/ceph-mds'
2017-04-29 07:31:11.782060 7fe6d7e9c700  1 mds.mon0r0  1: '--cluster=ceph'
2017-04-29 07:31:11.782071 7fe6d7e9c700  1 mds.mon0r0  2: '-i'
2017-04-29 07:31:11.782072 7fe6d7e9c700  1 mds.mon0r0  3: 'mon0r0'
2017-04-29 07:31:11.782073 7fe6d7e9c700  1 mds.mon0r0  4: '-f'
2017-04-29 07:31:11.782074 7fe6d7e9c700  1 mds.mon0r0  5: '--setuser'
2017-04-29 07:31:11.782075 7fe6d7e9c700  1 mds.mon0r0  6: 'ceph'
2017-04-29 07:31:11.782076 7fe6d7e9c700  1 mds.mon0r0  7: '--setgroup'
2017-04-29 07:31:11.782077 7fe6d7e9c700  1 mds.mon0r0  8: 'ceph'
2017-04-29 07:31:11.782106 7fe6d7e9c700  1 mds.mon0r0  exe_path /usr/bin/ceph-mds
2017-04-29 07:31:11.799625 7f5487a92180  0 ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367), process ceph-mds, pid 8235
2017-04-29 07:31:11.800097 7f5487a92180  0 pidfile_write: ignore empty --pid-file
2017-04-29 07:31:12.746033 7f5481a40700  1 mds.mon0r0 handle_mds_map standby
2017-04-29 07:32:01.941948 7f5481a40700  0 monclient: hunting for new mon
2017-04-29 07:32:48.186313 7f5481a40700  1 mds.mon0r0 handle_mds_map standby
2017-04-29 07:33:04.539413 7f5481a40700  0 monclient: hunting for new mon
2017-04-29 07:33:09.560848 7f5481a40700  1 mds.0.764 handle_mds_map i am now mds.0.764
2017-04-29 07:33:09.560857 7f5481a40700  1 mds.0.764 handle_mds_map state change up:boot --> up:replay
2017-04-29 07:33:09.560879 7f5481a40700  1 mds.0.764 replay_start
2017-04-29 07:33:09.560882 7f5481a40700  1 mds.0.764  recovery set is
2017-04-29 07:33:09.560890 7f5481a40700  1 mds.0.764  waiting for osdmap 17134 (which blacklists prior instance)
2017-04-29 07:33:09.571120 7f547c733700 -1 log_channel(cluster) log [ERR] : failed to read JournalPointer: -108 ((108) Cannot send after transport endpoint shutdown)
2017-04-29 07:33:09.575176 7f547c733700  1 mds.mon0r0 respawn
2017-04-29 07:33:09.575185 7f547c733700  1 mds.mon0r0  e: '/usr/bin/ceph-mds'
2017-04-29 07:33:09.575187 7f547c733700  1 mds.mon0r0  0: '/usr/bin/ceph-mds'
2017-04-29 07:33:09.575189 7f547c733700  1 mds.mon0r0  1: '--cluster=ceph'
2017-04-29 07:33:09.575191 7f547c733700  1 mds.mon0r0  2: '-i'
2017-04-29 07:33:09.575192 7f547c733700  1 mds.mon0r0  3: 'mon0r0'
2017-04-29 07:33:09.575193 7f547c733700  1 mds.mon0r0  4: '-f'
2017-04-29 07:33:09.575194 7f547c733700  1 mds.mon0r0  5: '--setuser'
2017-04-29 07:33:09.575195 7f547c733700  1 mds.mon0r0  6: 'ceph'
2017-04-29 07:33:09.575196 7f547c733700  1 mds.mon0r0  7: '--setgroup'
2017-04-29 07:33:09.575197 7f547c733700  1 mds.mon0r0  8: 'ceph'
2017-04-29 07:33:09.575221 7f547c733700  1 mds.mon0r0  exe_path /usr/bin/ceph-mds
2017-04-29 07:33:09.589993 7f9a9d0d1180  0 ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367), process ceph-mds, pid 8235
2017-04-29 07:33:09.590461 7f9a9d0d1180  0 pidfile_write: ignore empty --pid-file
2017-04-29 07:33:10.567466 7f9a9707f700  1 mds.mon0r0 handle_mds_map standby
2017-04-29 07:34:46.972551 7f9a9707f700  0 monclient: hunting for new mon
2017-04-29 07:34:50.583321 7f9a9707f700  1 mds.mon0r0 handle_mds_map standby
2017-04-29 07:35:24.575818 7f9a9707f700  0 monclient: hunting for new mon
2017-04-29 07:36:31.988193 7f9a9707f700  0 monclient: hunting for new mon
2017-04-29 07:38:06.999197 7f9a9707f700  0 monclient: hunting for new mon
2017-04-29 07:39:12.009821 7f9a9707f700  0 monclient: hunting for new mon
2017-04-29 07:39:21.855605 7f9a9707f700  1 mds.mon0r0 handle_mds_map standby
2017-04-29 07:41:39.994418 7f9a9707f700  0 monclient: hunting for new mon
< Continues like the above until mds was restarted ~1 h later
[ ... ]
2017-04-29 08:49:22.204803 7f9a93777700 -1 mds.mon0r0 *** got signal Terminated ***
2017-04-29 08:49:22.204821 7f9a93777700  1 mds.mon0r0 suicide.  wanted state up:standby
2017-04-29 09:00:31.510392 7ff9acd5e180  0 set uid:gid to 64045:64045 (ceph:ceph)
2017-04-29 09:00:31.510412 7ff9acd5e180  0 ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367), process ceph-mds, pid 23804
2017-04-29 09:00:31.510853 7ff9acd5e180  0 pidfile_write: ignore empty --pid-file
2017-04-29 09:00:31.973904 7ff9a64d1700  1 mds.mon0r0 handle_mds_map standby

All the remaining ceph-mds would suicide as well with logs like the above.

# ceph -s
health HEALTH_ERR                                                                                                                               
            mds rank 0 is damaged                                                                                                                                                   
            mds cluster is degraded                                                                                      
            1 mons down, quorum 1,2 mon1r0,mon2r0                                                                          
     monmap e1: 3 mons at {mon0r0=172.16.130.10:6789/0,mon1r0=172.16.130.11:6789/0,mon2r0=172.16.130.12:6789/0}               
            election epoch 2552, quorum 1,2 mon1r0,mon2r0                                                                        
      fsmap e830: 0/1/1 up, 2 up:standby, 1 damaged

Looking over logs this single line was new in all logs and I guess this is where something happened (or did not happen):
2017-04-29 07:33:09.571120 7f547c733700 -1 log_channel(cluster) log 
[ERR] : failed to read JournalPointer: -108 ((108) Cannot send after 
transport endpoint shutdown)

I eventually decided to follow: http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/

# cephfs-journal-tool journal export backup.bin
journal is 124172374543~44449141                                       
wrote 44449141 bytes at offset 124172374543 to backup.bin
NOTE: this is a _sparse_ file; you can
        $ tar cSzf backup.bin.tgz backup.bin
      to efficiently compress it while preserving sparseness.

# cephfs-journal-tool event recover_dentries summary
Events by type:
  OPEN: 2907
  SESSION: 22
  SUBTREEMAP: 30
  UPDATE: 19283
Errors: 0

# cephfs-journal-tool journal inspect
Overall journal integrity: OK

# cephfs-journal-tool journal reset
old journal was 124172374543~44449141          
new journal start will be 124218507264 (1683580 bytes past old end)
writing journal head                           
writing EResetJournal entry    
done 

And mds rejoined again and everything seems to work fine now.

We've been running ceph for several years and we also have a setup which is only being used for mds and is getting peaks of 10gbit writes/reads; we've never had this type of error before. 
This node is lightly used on the mds part; mostly storing shared configfiles/sessions.

Any ideas what could have caused it or if we can troubleshoot/do something to avoid in the future?

Thanks in advance,
Cheers,
Martin

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com