On Wed, 29 Nov 2023, Joe Thornber wrote: > On Wed, Nov 29, 2023 at 8:24 AM Joe Thornber <thornber@xxxxxxxxxx> wrote: > Hi Eric, > > Since integrity is below thin I would say this is likely a real issue > with your hardware; probably causing thinp to head down the rarely used > error path. While I suppose this is possible, the problem presented itself on two completely different NVMe mirror pairs, two NVMes of which are brand new. The system survived a 24 hour memory test and we have not seen any machine check exceptions or ECC errors, so is far as I can tell the hardware is working as expected. > It looks like something is holding the rw_semaphore that protects > accesses to the thin_metadata. This could either be a process that is > blocked holding it (in which case you should be able to see it). Or a > rarely used error path has omitted to release it. Is any kernel tooling around finding what is holding the lock, or some kind of tracing mechanism that I could trigger while rw_semaphore is locked? -- Eric Wheeler > > > Looking at the code I think it's really unlikely that we've accidentally left it unlocked. So could you check > all processes to see if any are holding it please, and check > the kernel logs to see if any processes were killed for some reason. > > - Joe > > >