On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > There may be deeper issues. I just started running scalability tests > (e.g. 16-way fsmark create tests) and about a minute in I got a > directory corruption reported - something I hadn't seen in the dev > cycle at all. By "in the dev cycle", do you mean your XFS changes, or have you been tracking the merge cycle at least for some testing? > I unmounted the fs, mkfs'd it again, ran the > workload again and about a minute in this fired: > > [628867.607417] ------------[ cut here ]------------ > [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 shadow_lru_isolate+0x171/0x220 Well, part of the changes during the merge window were the shadow entry tracking changes that came in through Andrew's tree. Adding Johannes Weiner to the participants. > Now, this workload does not touch the page cache at all - it's > entirely an XFS metadata workload, so it should not really be > affecting the working set code. Well, I suspect that anything that creates memory pressure will end up triggering the working set code, so .. That said, obviously memory corruption could be involved and result in random issues too, but I wouldn't really expect that in this code. It would probably be really useful to get more data points - is the problem reliably in this area, or is it going to be random and all over the place. That said: > And worse, on that last error, the /host/ is now going into meltdown > (running 4.7.5) with 32 CPUs all burning down in ACPI code: The obvious question here is how much you trust the environment if the host ends up also showing problems. Maybe you do end up having hw issues pop up too. The primary suspect would presumably be the development kernel you're testing triggering something, but it has to be asked.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html