On Mon, Feb 29, 2016 at 11:47:54AM +0200, Alex Lyakas wrote: > Hello Dave, > I have tried the same scenario with the 4.5 kernel from about a week > ago, latest commit being [1]. > The same crash is happening, stack trace being [2]. > > I am not proficient with xfstests, unfortunately. I tried running > them several times, but I am not sure I was doing that properly. > > As for your question why the "block beyond end of the filesystem > fails". I tried to debug it further and added a print into > _xfs_buf_find. What happens is that at some point, the sb_dblocks > value is updated to the new value, but the in-memory pag object is > not created. So the test: > > eofs = XFS_FSB_TO_BB(btp->bt_mount, btp->bt_mount->m_sb.sb_dblocks); > if (blkno < 0 || blkno >= eofs) { > ... > > still holds, but the needed pag does not exist. > > Here are the results of the prints that I added: > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.546542] SGI XFS with > ACLs, security attributes, realtime, no debug enabled > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.551243] XFS (dm-0): > Mounting V4 Filesystem > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.576677] XFS (dm-0): > Starting recovery (logdev: internal) > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.577866] > _xfs_buf_find: blkno=0 eofs=200704 >m_sb.sb_dblocks=25088 > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.577872] > _xfs_buf_find: blkno=0 eofs=200704 >m_sb.sb_dblocks=25088 > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.577882] > _xfs_buf_find: blkno=0 eofs=200704 >m_sb.sb_dblocks=25088 > ===> Here we start seeing the new value of "sb_dblocks" and hence > the new value of "eofs": We shouldn't see sb_dblocks change until log recovery completes the first phase of log recovery and the in-core superblock is re-read from disk in xlog_do_recover(). > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.606796] > _xfs_buf_find: blkno=1 eofs=204800 >m_sb.sb_dblocks=25600 > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.606804] > _xfs_buf_find: blkno=1 eofs=204800 >m_sb.sb_dblocks=25600 looking up AGF 0. > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.606988] > _xfs_buf_find: blkno=2 eofs=204800 >m_sb.sb_dblocks=25600 > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.606991] > _xfs_buf_find: blkno=2 eofs=204800 >m_sb.sb_dblocks=25600 Now AGI 0. > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.607205] > _xfs_buf_find: blkno=50177 eofs=204800 >m_sb.sb_dblocks=25600 > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.607210] > _xfs_buf_find: blkno=50177 eofs=204800 >m_sb.sb_dblocks=25600 > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.607375] > _xfs_buf_find: blkno=50178 eofs=204800 >m_sb.sb_dblocks=25600 > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.607378] > _xfs_buf_find: blkno=50178 eofs=204800 >m_sb.sb_dblocks=25600 AGF/AGI 1. > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.607512] > _xfs_buf_find: blkno=100353 eofs=204800 >m_sb.sb_dblocks=25600 > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.607515] > _xfs_buf_find: blkno=100353 eofs=204800 >m_sb.sb_dblocks=25600 > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.607652] > _xfs_buf_find: blkno=100354 eofs=204800 >m_sb.sb_dblocks=25600 > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.607655] > _xfs_buf_find: blkno=100354 eofs=204800 >m_sb.sb_dblocks=25600 AGF/AGI 2. > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.607829] > _xfs_buf_find: blkno=150529 eofs=204800 >m_sb.sb_dblocks=25600 > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.607832] > _xfs_buf_find: blkno=150529 eofs=204800 >m_sb.sb_dblocks=25600 > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.607980] > _xfs_buf_find: blkno=150530 eofs=204800 >m_sb.sb_dblocks=25600 > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.607982] > _xfs_buf_find: blkno=150530 eofs=204800 >m_sb.sb_dblocks=25600 > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.608073] AGF/AGI 4. > _xfs_buf_find: blkno=200705 eofs=204800 >m_sb.sb_dblocks=25600 > ===> and here we crash, but as you see, blkno is valid WRT eofs value. > Feb 29 11:47:09 vc-00-00-350-dev kernel: [ 91.608120] BUG: unable > to handle kernel NULL pointer dereference at 0000000000000098 AGF 5 goes splat. Which means it's through the first phase of log recovery and it's not failing in log recovery. i.e. we are now running xfs_initialize_perag_data() after log recovery. So, as I said a couple of posts back up this thread: | If log recovery succeeds, then yes, I can see that there is a | problem here because the per-ag tree is not reinitialised after | the superblock is re-read. That's a pretty easy fix, though (3-4 | lines of code in xlog_do_recover() to detect a change in | filesystem block count and call xfs_initialize_perag() again.. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs