On 3/19/22 03:06, Christian Lamparter wrote: > On 18/03/2022 15:16, Andy Shevchenko wrote: >> On Fri, Mar 18, 2022 at 10:03:11AM +0100, Christian Lamparter wrote: >>> the driver uses libata's "tag" values from in various arrays. >>> Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32, >>> the value of the SATA_DWC_QCMD_MAX needs to be bumped to 33. >>> >>> Otherwise ATA_TAG_INTERNAL cause a crash like this: >>> >>> | BUG: Kernel NULL pointer dereference at 0x00000000 >>> | Faulting instruction address: 0xc03ed4b8 >>> | Oops: Kernel access of bad area, sig: 11 [#1] >>> | BE PAGE_SIZE=4K PowerPC 44x Platform >>> | CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0 >>> | NIP: c03ed4b8 LR: c03d27e8 CTR: c03ed36c >>> | REGS: cfa59950 TRAP: 0300 Not tainted (5.4.163) >>> | MSR: 00021000 <CE,ME> CR: 42000222 XER: 00000000 >>> | DEAR: 00000000 ESR: 00000000 >>> | GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...] >>> | [..] >>> | NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254 >>> | LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc >>> | Call Trace: >>> | [cfa59a08] [c003f4e0] __cancel_work_timer+0x124/0x194 (unreliable) >>> | [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc >>> | [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524 >>> | [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0 >>> | [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204 >>> | [cfa59bc8] [c03d324c] ata_identify_page_supported+0x68/0x130 >>> | [...] >>> >>> this is because sata_dwc_dma_xfer_complete() NULLs the >>> dma_pending's next neighbour "chan" (a *dma_chan struct) in >>> this '32' case right here (line ~735): >>>> hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE; >>> >>> Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes >>> the NULL'd hsdevp->chan to the dmaengine_slave_config() which then >>> causes the crash. >> >> ... >> >>> ticerex said when I've asked him about his real name+email for the patches: >>> "Please use my forum nick." >>> <https://forum.openwrt.org/t/my-book-live-duo-reboot-loop/122464/14> >>> (I know checkpatch.pl complains about that. But what can you do?) >> >> I think Reported-by: is fine to have any kind of reference to the reporter. >> I can consider it false positive. >> >> ... > > K, I've reported this back to the reporter ;). > > Documentation/process/maintainer-tip.rst and process/5.Posting.rst > provided some hints: > > "Please note that if the bug was reported in private, then ask for > permission first before using the Reported-by tag." > > and maintainer-tip.rst, the format should be: > ``Reported-by: ``Reporter <reporter@mail>`` > > (My goal here is to get "a fix" merged, so conformance is key. ;-)) > >>> -#define SATA_DWC_QCMD_MAX 32 >>> +#define SATA_DWC_QCMD_MAX 33 >> >> Can't we use >> >> #define SATA_DWC_QCMD_MAX (ATA_TAG_INTERNAL + 1) >> > > I've looked around a bit. > > include/linux/libata.h itself has the following related definitions: > > | enum { > | ATA_MAX_QUEUE = 32, > | ATA_TAG_INTERNAL = ATA_MAX_QUEUE, > | [..] > > | struct ata_port { > | [...] > | struct ata_queued_cmd qcmd[ATA_MAX_QUEUE + 1]; > | > | } > > ATA_MAX_QUEUE seems to be the "size" value. Whereas ATA_TAG_INTERNAL > is used as (tag == ATA_TAG_INTERNAL) more often. > > > I came up with a viable compromise: > > Would it be "OK" to define a "new" ATA_MAX_TAG? > > This could be either set ATA_MAX_TAG = ATA_MAX_QUEUE + 1 That will be confusing ! ATA physically allows only up to 32 tags (0 to 31) because the command tag field is only 5 bits. So we should not define anything like this defining a tag value that is invalid. ATA_TAG_INTERNAL is special and used only to identify that the command is a libata internal command, either for device scan/revalidate or for error handling. This value is *never* used as a tag for an NCQ command because *all* internal commands are in fact not NCQ ! The internal commands from libata are all non queueable commands. Meaning that if a device driver sees a qc with its tag set to ATA_TAG_INTERNAL, it means that there are no other commands on-going and all tags should be unused in the driver. In the case of the dw driver, it means that we could arbitrarily use any of the valid tag values for managing that internal command without any issue. But having looked at the driver, as I said, it is a bigger change than just faking a 33rd "tag" that is in fact not a command tag at all. For these reasons, I really prefer defining SATA_DWC_QCMD_MAX as (ATA_MAX_QUEUE + 1) with the comment above it clarifying why it is defined like that. > > or let it be assigned automatically, if it's placed after ATA_TAG_INTERNAL > in the libata.h's enum like this: (I prefer this, but it being "33" is not > obvious if you don't dabble in C all the time) > > | enum { > | [...] > | ATA_MAX_QUEUE = 32, > | ATA_TAG_INTERNAL = ATA_MAX_QUEUE, > | /* > | * ATA_MAX_TAG accounts for ATA_MAX_QUEUE TAGs + 1 special TAG/slot > | * for ATA_TAG_INTERNAL (libata's internal commands/DMA management). > | * This needs to be it in this location. > | */ > | ATA_MAX_TAG, > | [...] > > This ATA_MAX_TAG could then be used for ata_port's qcmd (from above) and > sata_dwc_460ex's SATA_DWC_QCMD_MAX. I understand your point. But again, given that ATA_TAG_INTERNAL is only a special tag value for libata and cannot be used by devices, I would rather not define this ATA_MAX_TAG macro, to avoid confusions. > > For good measure: > > BUILD_BUG_ON(ATA_TAG_INTERNAL >= ATA_MAX_TAG); > > could be added into libata.h's __ata_qc_from_tag(). > It access the qcmd array, so anyone using libata's accessors will catch > future updates. > > Is this fine or getting to close to being overbuild? Not overbuilt, but not fine either. In my opinion, the root cause of the issue is that the DW driver blindly uses the qc->tag value regardless of if the command is NCQ or not. For non-ncq commands, tag 0 could be used always for the command management arrays index, as by definition, non NCQ commands imply QD=1 operation. As I mentioned, this is a slightly bigger fix though. So I am OK with simply changing the macro definition to have the +1 for now. But I am certainly not against getting this driver correctly fixed to properly handle command types and tags :) -- Damien Le Moal Western Digital Research