On 3/30/22 22:48, Ming Lei wrote: > On Wed, Mar 30, 2022 at 09:31:35AM -0400, James Bottomley wrote: >> On Wed, 2022-03-30 at 13:59 +0100, John Garry wrote: >>> On 30/03/2022 12:21, Andrea Righi wrote: >>>> On Wed, Mar 30, 2022 at 11:38:02AM +0100, John Garry wrote: >>>>> On 30/03/2022 11:11, Andrea Righi wrote: >>>>>> Hello, >>>>>> >>>>>> after this commit I'm experiencing some filesystem corruptions >>>>>> at boot on a power9 box with an aacraid controller. >>>>>> >>>>>> At the moment I'm running a 5.15.30 kernel; when the filesystem >>>>>> is mounted at boot I see the following errors in the console: >>> >>> About "scsi: core: Reallocate device's budget map on queue depth >>> change" being added to a stable kernel, I am not sure if this was >>> really a fix or just a memory optimisation. >> >> I can see how it becomes the problem: it frees and allocates a new >> bitmap across a queue freeze, but bits in the old one might still be in >> use. This isn't a problem except when they return and we now possibly >> see a tag greater than we think we can allocate coming back. >> Presumably we don't check this and we end up doing a write to >> unallocated memory. >> >> I think if you want to reallocate on queue depth reduction, you might >> have to drain the queue as well as freeze it. > > After queue is frozen, there can't be any in-flight request/scsi > command, so the sbitmap is zeroed at that time, and safe to reallocate. > > The problem is aacraid specific, since the driver has hard limit > of 256 queue depth, see aac_change_queue_depth(). 256 is the scsi hard limit per device... Any SAS drive has the same limit by default since there is no way to know the max queue depth of a scsi disk. So what is special about aacraid ? > > > Thanks, > Ming > -- Damien Le Moal Western Digital Research