I have experienced a crash today (in a sense of filesystem going offline) of a 25TB XFS filesystem. Tried searching the list and google, and not much specific info I can use, so very much appreciate any insight: System: Ubuntu 16.04, kernel 4.10.17-041017-generic Mount info: /dev/rbd0 on /srv/exports/sclun63 type xfs (rw,relatime,attr2,inode64,logbsize=256k,sunit=8192,swidth=8192,noquota) Failure trace: Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.769776] XFS (rbd1): Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of file /home/kernel/COD/linux/fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x352/0x740 [xfs] Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.769890] CPU: 20 PID: 724783 Comm: kworker/20:4 Tainted: G OE 4.10.17-041017-generic #201705201051 Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.769963] Hardware name: Dell Inc. Precision Rack 7910/01J90F, BIOS 1.1.4 11/04/2014 Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770058] Workqueue: xfs-conv/rbd1 xfs_end_io [xfs] Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770098] Call Trace: Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770136] dump_stack+0x63/0x81 Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770198] xfs_error_report+0x3c/0x40 [xfs] Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770257] ? xfs_free_ag_extent+0x352/0x740 [xfs] Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770317] xfs_btree_insert+0x16d/0x1c0 [xfs] Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770375] xfs_free_ag_extent+0x352/0x740 [xfs] Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770433] xfs_free_extent+0xa8/0x140 [xfs] Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770501] xfs_trans_free_extent+0x43/0x110 [xfs] Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770568] xfs_extent_free_finish_item+0x26/0x40 [xfs] Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770630] xfs_defer_finish+0x151/0x430 [xfs] Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770694] xfs_iomap_write_unwritten+0xd3/0x2e0 [xfs] Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770737] ? propagate_entity_cfs_rq.isra.66+0x271/0xa30 Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770802] xfs_end_io+0xa0/0xb0 [xfs] Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770840] process_one_work+0x1fc/0x4b0 Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770877] worker_thread+0x4b/0x500 Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770914] kthread+0x109/0x140 Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770949] ? process_one_work+0x4b0/0x4b0 Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770987] ? kthread_create_on_node+0x60/0x60 Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.771028] ret_from_fork+0x2c/0x40 Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.771093] XFS (rbd1): xfs_do_force_shutdown(0x8) called from line 236 of file /home/kernel/COD/linux/fs/xfs/libxfs/xfs_defer.c. Return address = 0xffffffffc0f0d366 Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.772151] XFS (rbd1): Corruption of in-memory data detected. Shutting down filesystem Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.772219] XFS (rbd1): Please umount the filesystem and rectify the problem(s) Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.772292] Buffer I/O error on dev rbd1, logical block 5078005778, lost async page write Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.772362] Buffer I/O error on dev rbd1, logical block 5078005779, lost async page write Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.777755] Buffer I/O error on dev rbd1, logical block 5078005780, lost async page write xfs_repair (had to do -L): root@roc01r-scd224:~# xfs_repair -L /dev/rbd0 Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... freeblk count 3 != flcount 4 in ag 47 freeblk count 3 != flcount 4 in ag 45 freeblk count 3 != flcount 4 in ag 46 freeblk count 3 != flcount 4 in ag 53 freeblk count 3 != flcount 4 in ag 49 freeblk count 3 != flcount 4 in ag 52 freeblk count 3 != flcount 4 in ag 56 freeblk count 3 != flcount 4 in ag 51 freeblk count 3 != flcount 4 in ag 48 freeblk count 3 != flcount 4 in ag 57 freeblk count 3 != flcount 4 in ag 54 freeblk count 3 != flcount 4 in ag 50 freeblk count 3 != flcount 4 in ag 58 freeblk count 3 != flcount 4 in ag 55 freeblk count 3 != flcount 4 in ag 59 freeblk count 3 != flcount 4 in ag 61 freeblk count 3 != flcount 4 in ag 62 freeblk count 3 != flcount 4 in ag 63 freeblk count 3 != flcount 4 in ag 60 freeblk count 3 != flcount 4 in ag 70 freeblk count 3 != flcount 4 in ag 65 freeblk count 3 != flcount 4 in ag 66 freeblk count 3 != flcount 4 in ag 71 freeblk count 3 != flcount 4 in ag 68 freeblk count 3 != flcount 4 in ag 69 freeblk count 3 != flcount 4 in ag 64 freeblk count 3 != flcount 4 in ag 73 freeblk count 3 != flcount 4 in ag 67 freeblk count 3 != flcount 4 in ag 78 freeblk count 3 != flcount 4 in ag 76 freeblk count 3 != flcount 4 in ag 72 freeblk count 3 != flcount 4 in ag 80 freeblk count 3 != flcount 4 in ag 83 freeblk count 3 != flcount 4 in ag 84 freeblk count 3 != flcount 4 in ag 75 freeblk count 3 != flcount 4 in ag 74 freeblk count 3 != flcount 4 in ag 85 freeblk count 3 != flcount 4 in ag 79 freeblk count 3 != flcount 4 in ag 81 freeblk count 3 != flcount 4 in ag 77 freeblk count 3 != flcount 4 in ag 82 freeblk count 3 != flcount 4 in ag 88 freeblk count 3 != flcount 4 in ag 91 freeblk count 3 != flcount 4 in ag 86 freeblk count 3 != flcount 4 in ag 87 freeblk count 3 != flcount 4 in ag 93 freeblk count 3 != flcount 4 in ag 92 freeblk count 3 != flcount 4 in ag 90 freeblk count 3 != flcount 4 in ag 94 freeblk count 3 != flcount 4 in ag 89 freeblk count 3 != flcount 4 in ag 97 freeblk count 3 != flcount 4 in ag 95 freeblk count 3 != flcount 4 in ag 96 freeblk count 3 != flcount 4 in ag 98 freeblk count 3 != flcount 4 in ag 100 freeblk count 3 != flcount 4 in ag 99 freeblk count 3 != flcount 4 in ag 103 freeblk count 3 != flcount 4 in ag 101 freeblk count 3 != flcount 4 in ag 104 freeblk count 3 != flcount 4 in ag 102 freeblk count 3 != flcount 4 in ag 105 freeblk count 3 != flcount 4 in ag 110 freeblk count 3 != flcount 4 in ag 106 freeblk count 3 != flcount 4 in ag 113 freeblk count 3 != flcount 4 in ag 115 freeblk count 3 != flcount 4 in ag 111 freeblk count 3 != flcount 4 in ag 109 freeblk count 3 != flcount 4 in ag 107 freeblk count 3 != flcount 4 in ag 116 freeblk count 3 != flcount 4 in ag 114 freeblk count 3 != flcount 4 in ag 108 freeblk count 3 != flcount 4 in ag 117 freeblk count 3 != flcount 4 in ag 124 freeblk count 3 != flcount 4 in ag 119 freeblk count 3 != flcount 4 in ag 118 freeblk count 3 != flcount 4 in ag 120 freeblk count 3 != flcount 4 in ag 121 freeblk count 3 != flcount 4 in ag 122 freeblk count 3 != flcount 4 in ag 126 freeblk count 3 != flcount 4 in ag 125 freeblk count 3 != flcount 4 in ag 127 freeblk count 3 != flcount 4 in ag 131 freeblk count 3 != flcount 4 in ag 123 freeblk count 3 != flcount 4 in ag 130 freeblk count 3 != flcount 4 in ag 133 freeblk count 3 != flcount 4 in ag 128 freeblk count 3 != flcount 4 in ag 137 freeblk count 3 != flcount 4 in ag 136 freeblk count 3 != flcount 4 in ag 138 freeblk count 3 != flcount 4 in ag 135 freeblk count 3 != flcount 4 in ag 132 freeblk count 3 != flcount 4 in ag 139 freeblk count 3 != flcount 4 in ag 134 freeblk count 3 != flcount 4 in ag 129 freeblk count 3 != flcount 4 in ag 145 freeblk count 3 != flcount 4 in ag 147 freeblk count 3 != flcount 4 in ag 144 freeblk count 3 != flcount 4 in ag 140 freeblk count 3 != flcount 4 in ag 149 freeblk count 3 != flcount 4 in ag 143 freeblk count 3 != flcount 4 in ag 142 freeblk count 3 != flcount 4 in ag 141 freeblk count 3 != flcount 4 in ag 152 freeblk count 3 != flcount 4 in ag 148 freeblk count 3 != flcount 4 in ag 151 freeblk count 3 != flcount 4 in ag 146 freeblk count 3 != flcount 4 in ag 154 freeblk count 3 != flcount 4 in ag 150 freeblk count 3 != flcount 4 in ag 155 freeblk count 3 != flcount 4 in ag 157 freeblk count 3 != flcount 4 in ag 156 freeblk count 3 != flcount 4 in ag 160 freeblk count 3 != flcount 4 in ag 158 freeblk count 3 != flcount 4 in ag 153 freeblk count 3 != flcount 4 in ag 159 sb_ifree 667, counted 615 sb_fdblocks 3930698117, counted 1111093367 - 15:06:10: scanning filesystem freespace - 193 of 193 allocation groups done - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - 15:06:10: scanning agi unlinked lists - 193 of 193 allocation groups done - process known inodes and perform inode discovery... - agno = 0 - agno = 30 - agno = 60 - agno = 15 - agno = 75 - agno = 45 - agno = 150 - agno = 31 - agno = 105 - agno = 61 - agno = 46 - agno = 135 - agno = 76 - agno = 120 - agno = 151 - agno = 90 - agno = 180 - agno = 165 - agno = 91 - agno = 32 - agno = 62 - agno = 106 - agno = 121 - agno = 166 - agno = 136 - agno = 47 - agno = 92 - agno = 16 - agno = 1 - agno = 181 - agno = 33 - agno = 167 - agno = 93 - agno = 122 - agno = 63 - agno = 152 - agno = 77 - agno = 2 - agno = 168 - agno = 48 - agno = 34 - agno = 182 - agno = 123 - agno = 107 - agno = 78 - agno = 153 - agno = 17 - agno = 3 - agno = 137 - agno = 35 - agno = 64 - agno = 79 - agno = 169 - agno = 138 - agno = 183 - agno = 108 - agno = 94 - agno = 65 - agno = 124 - agno = 36 - agno = 154 - agno = 80 - agno = 49 - agno = 184 - agno = 109 - agno = 37 - agno = 95 - agno = 125 - agno = 170 - agno = 50 - agno = 66 - agno = 81 - agno = 110 - agno = 139 - agno = 155 - agno = 185 - agno = 96 - agno = 126 - agno = 38 - agno = 171 - agno = 127 - agno = 140 - agno = 97 - agno = 51 - agno = 128 - agno = 67 - agno = 111 - agno = 82 - agno = 156 - agno = 186 - agno = 141 - agno = 172 - agno = 98 - agno = 129 - agno = 52 - agno = 130 - agno = 39 - agno = 142 - agno = 173 - agno = 53 - agno = 99 - agno = 157 - agno = 68 - agno = 112 - agno = 40 - agno = 83 - agno = 131 - agno = 54 - agno = 187 - agno = 113 - agno = 69 - agno = 143 - agno = 174 - agno = 84 - agno = 158 - agno = 41 - agno = 100 - agno = 159 - agno = 175 - agno = 144 - agno = 188 - agno = 42 - agno = 85 - agno = 70 - agno = 132 - agno = 101 - agno = 114 - agno = 55 - agno = 176 - agno = 160 - agno = 145 - agno = 86 - agno = 71 - agno = 189 - agno = 102 - agno = 43 - agno = 115 - agno = 161 - agno = 177 - agno = 56 - agno = 133 - agno = 72 - agno = 103 - agno = 44 - agno = 87 - agno = 190 - agno = 134 - agno = 73 - agno = 162 - agno = 116 - agno = 57 - agno = 146 - agno = 74 - agno = 191 - agno = 104 - agno = 117 - agno = 147 - agno = 163 - agno = 88 - agno = 178 - agno = 58 - agno = 192 - agno = 118 - agno = 148 - agno = 89 - agno = 164 - agno = 59 - agno = 179 - agno = 119 - agno = 149 - agno = 4 - agno = 5 - agno = 18 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 19 - agno = 20 - agno = 21 - agno = 22 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 29 - 15:10:19: process known inodes and inode discovery - 896 of 896 inodes done - process newly discovered inodes... - 15:10:19: process newly discovered inodes - 193 of 193 allocation groups done Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - 15:10:19: setting up duplicate extent list - 193 of 193 allocation groups done - check for inodes claiming duplicate blocks... - agno = 1 - agno = 0 - agno = 2 - agno = 3 - agno = 9 - agno = 12 - agno = 6 - agno = 8 - agno = 4 - agno = 10 - agno = 7 - agno = 13 - agno = 31 - agno = 29 - agno = 35 - agno = 38 - agno = 40 - agno = 42 - agno = 43 - agno = 45 - agno = 18 - agno = 19 - agno = 21 - agno = 22 - agno = 23 - agno = 24 - agno = 26 - agno = 51 - agno = 52 - agno = 25 - agno = 53 - agno = 54 - agno = 55 - agno = 32 - agno = 14 - agno = 33 - agno = 61 - agno = 62 - agno = 30 - agno = 5 - agno = 65 - agno = 66 - agno = 37 - agno = 16 - agno = 39 - agno = 68 - agno = 69 - agno = 70 - agno = 17 - agno = 44 - agno = 73 - agno = 74 - agno = 75 - agno = 76 - agno = 77 - agno = 78 - agno = 79 - agno = 50 - agno = 27 - agno = 81 - agno = 28 - agno = 11 - agno = 34 - agno = 85 - agno = 86 - agno = 59 - agno = 87 - agno = 60 - agno = 63 - agno = 90 - agno = 91 - agno = 64 - agno = 93 - agno = 94 - agno = 95 - agno = 41 - agno = 71 - agno = 72 - agno = 36 - agno = 20 - agno = 46 - agno = 47 - agno = 102 - agno = 48 - agno = 104 - agno = 105 - agno = 106 - agno = 108 - agno = 84 - agno = 110 - agno = 56 - agno = 112 - agno = 113 - agno = 115 - agno = 116 - agno = 117 - agno = 118 - agno = 96 - agno = 97 - agno = 98 - agno = 99 - agno = 100 - agno = 101 - agno = 103 - agno = 49 - agno = 126 - agno = 127 - agno = 107 - agno = 82 - agno = 83 - agno = 131 - agno = 109 - agno = 133 - agno = 111 - agno = 134 - agno = 57 - agno = 58 - agno = 89 - agno = 92 - agno = 67 - agno = 142 - agno = 143 - agno = 119 - agno = 120 - agno = 145 - agno = 146 - agno = 123 - agno = 148 - agno = 149 - agno = 150 - agno = 151 - agno = 152 - agno = 129 - agno = 154 - agno = 130 - agno = 157 - agno = 135 - agno = 136 - agno = 88 - agno = 114 - agno = 137 - agno = 162 - agno = 164 - agno = 139 - agno = 140 - agno = 166 - agno = 141 - agno = 168 - agno = 122 - agno = 171 - agno = 124 - agno = 173 - agno = 174 - agno = 128 - agno = 176 - agno = 177 - agno = 155 - agno = 156 - agno = 132 - agno = 180 - agno = 158 - agno = 159 - agno = 160 - agno = 161 - agno = 186 - agno = 138 - agno = 188 - agno = 189 - agno = 15 - agno = 144 - agno = 191 - agno = 192 - agno = 170 - agno = 147 - agno = 172 - agno = 125 - agno = 80 - agno = 175 - agno = 153 - agno = 178 - agno = 179 - agno = 181 - agno = 182 - agno = 183 - agno = 184 - agno = 185 - agno = 163 - agno = 187 - agno = 165 - agno = 167 - agno = 190 - agno = 169 - agno = 121 - 15:10:22: check for inodes claiming duplicate blocks - 896 of 896 inodes done Phase 5 - rebuild AG headers and trees... - 15:10:22: rebuild AG headers and trees - 193 of 193 allocation groups done - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (792:1294080) is ahead of log (1:64). Format log to cycle 795. done root@roc01r-scd224:~# mount -t xfs /dev/rbd0 /mnt No other errors in logs, Ceph or hardware. Thank you in advance! -- Alex Gorbachev Storcium -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html