On Tue, Jul 17, 2018 at 11:26 AM Carlos Maiolino <cmaiolino@xxxxxxxxxx> wrote: > > > Ouch, indeed we've seen this problem on multiple nodes, said hosts > > > belong to the same and latest shipment from the OEM. We'll run > > > hardware diagnostics on these hosts and others we've received at > > > another datacenter (which haven't shown issues so far but don't serve > > > reads either). > > > > Update on this: we've ran hw diagnostics and couldn't find anything > > wrong, xfs_repair does fix the issue so we'll be going ahead with > > that. Is there anything we can do to help debugging in case this > > happens again? > > > > There is a patch being discussed on list to help catch these bit corruptions > before they reach the disk, but, bear in mind we can only improve the validation > of our metadata. Nothing actually forbids these bit flips are occurring on your > data, and you are actually writing corrupted data into your files. We've found no other cases of bit flips or corruption in metadata or the data itself though. To recap what we've seen, hardware bit flipping is extremely unlikely: the same type of sb_fdblocks corruption has appeared on four different hosts affecting at most one third of xfs filesystems per host. Also the corruption looks always the same, namely the 33rd bit flipped which also seems suspicious. HTH, Filippo -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html