On Thu, Mar 31, 2022 at 07:30:35AM +0900, Damien Le Moal wrote: > On 3/30/22 22:48, Ming Lei wrote: > > On Wed, Mar 30, 2022 at 09:31:35AM -0400, James Bottomley wrote: > >> On Wed, 2022-03-30 at 13:59 +0100, John Garry wrote: > >>> On 30/03/2022 12:21, Andrea Righi wrote: > >>>> On Wed, Mar 30, 2022 at 11:38:02AM +0100, John Garry wrote: > >>>>> On 30/03/2022 11:11, Andrea Righi wrote: > >>>>>> Hello, > >>>>>> > >>>>>> after this commit I'm experiencing some filesystem corruptions > >>>>>> at boot on a power9 box with an aacraid controller. > >>>>>> > >>>>>> At the moment I'm running a 5.15.30 kernel; when the filesystem > >>>>>> is mounted at boot I see the following errors in the console: > >>> > >>> About "scsi: core: Reallocate device's budget map on queue depth > >>> change" being added to a stable kernel, I am not sure if this was > >>> really a fix or just a memory optimisation. > >> > >> I can see how it becomes the problem: it frees and allocates a new > >> bitmap across a queue freeze, but bits in the old one might still be in > >> use. This isn't a problem except when they return and we now possibly > >> see a tag greater than we think we can allocate coming back. > >> Presumably we don't check this and we end up doing a write to > >> unallocated memory. > >> > >> I think if you want to reallocate on queue depth reduction, you might > >> have to drain the queue as well as freeze it. > > > > After queue is frozen, there can't be any in-flight request/scsi > > command, so the sbitmap is zeroed at that time, and safe to reallocate. > > > > The problem is aacraid specific, since the driver has hard limit > > of 256 queue depth, see aac_change_queue_depth(). > > 256 is the scsi hard limit per device... Any SAS drive has the same limit > by default since there is no way to know the max queue depth of a scsi > disk.So what is special about aacraid ? > I meant aac_change_queue_depth() sets hard limit of 256. Yeah, for any hba driver which implements its own .change_queue_depth(), there may be one hard limit there. So I still don't understand why you mention '256 is the scsi hard limit per device', and where is the code? If both .cma_per_lun and .can_queue are > 256 and the driver uses default scsi_change_queue_depth() and sdev->tagged_supported is true, then user is free to change queue depth via /sys/block/$SDN/device/queue_depth to > 256. It is same for SAS, see sas_change_queue_depth(). Also I am pretty sure some type of scsi device is capable of supporting >256 queue depth, include sas, and sas usually has big queue depth. Thanks, Ming