On 5/15/17 4:22 AM, Jan Beulich wrote: >>>> On 12.05.17 at 17:11, <sandeen@xxxxxxxxxxx> wrote: > >> >> On 5/12/17 10:04 AM, Eric Sandeen wrote: >>> On 5/12/17 9:09 AM, Jan Beulich wrote: >>>>>>> On 12.05.17 at 15:56, <sandeen@xxxxxxxxxxx> wrote: >>>>> On 5/12/17 1:26 AM, Jan Beulich wrote: >>>>>> So on the earlier instance, where I did run actual repairs (and >>>>>> indeed multiple of them), the problem re-surfaces every time >>>>>> I mount the volume again. >>>>> Ok, what is the exact sequence there, from repair to re-corruption? >>>> Simply mount the volume after repairing (with or without an >>>> intermediate reboot) and access respective pieces of the fs >>>> again. As said, with /var/run affected on that first occasion, >>>> I couldn't even cleanly boot again without seeing the >>>> corruption re-surface. >>> >>> Mount under what kernel, and access in what way? I'm looking for a >>> recipe to reproduce what you've seen using the metadump you've provided. >>> >>> However: >>> >>> With further testing I see that xfs_repair v3.1.8 /does not/ >>> entirely fix the fs; if I run 3.1.8 and then run upstream repair, it >>> finds and fixes more bad flags on inode 764 (lib/xenstored/tdb) that 3.1.8 >>> didn't touch. The verifiers in an upstream kernel may keep tripping >>> over that until newer repair fixes it... >> >> (Indeed just running xfs_repair 3.1.8 finds the same corruption over and >> over) >> >> Please try a newer xfs_repair, and see if it resolves your problem. > > It seems to have improved the situation (on the first system I had > the issue on), but leaves me with at least "Operation not permitted" > upon init scripts (or me manually) rm-ing (or mv-ing) /var/run/*.pid > (or mv-ing even /var/run itself). I'm not sure how worried I need to > be, but this surely doesn't look overly healthy yet. The kernel > warnings are all gone, though. xfs_repair simply makes the filesystem consistent, it doesn't perform any other magic. :) The corruption we saw was related to incorrect flags set on an inode - in some cases flags like immutable which can affect access to the file. I'm not sure we've made much progress on the root cause of whatever set those extra flags*, but all repair will do is make them sane from a filesystem consistency POV, not from an OS operation POV. Check the files in question with lsattr, and see if there are unexpected flags still set. -Eric * but backing up towards root cause, you said this all started when a 4.11 kernel crashed, and the log replayed? What kind of crash, what caused it, what were the messages? > Jan -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html