Re: [PATCH v10 5/8] coresight: tmc: Add support for reading crash data

Linu Cherian <lcherian@xxxxxxxxxxx> · Tue, 29 Oct 2024 11:51:12 +0530

Hi Suzuki,

On 2024-10-21 at 18:10:40, Linu Cherian (lcherian@xxxxxxxxxxx) wrote:
> Hi Suzuki,
> 
> On 2024-10-18 at 15:16:17, Suzuki K Poulose (suzuki.poulose@xxxxxxx) wrote:
> > On 17/10/2024 12:40, Linu Cherian wrote:
> > > On 2024-10-03 at 18:55:54, Suzuki K Poulose (suzuki.poulose@xxxxxxx) wrote:
> > > > Hi Linu
> > > > 
> > > > On 16/09/2024 11:34, Linu Cherian wrote:
> > > > > * Add support for reading crashdata using special device files.
> > > > >     The special device files /dev/crash_tmc_xxx would be available
> > > > >     for read file operation only when the crash data is valid.
> > > > > 
> > > > > * User can read the crash data as below
> > > > > 
> > > > >     For example, for reading crash data from tmc_etf sink
> > > > > 
> > > > >     #dd if=/dev/crash_tmc_etfXX of=~/cstrace.bin
> > > > 
> > > > There are some comments below, please take a look.
> > > > 
> > > > > 
> > > > > Signed-off-by: Anil Kumar Reddy <areddy3@xxxxxxxxxxx>
> > > > > Signed-off-by: Tanmay Jagdale <tanmay@xxxxxxxxxxx>
> > > > > Signed-off-by: Linu Cherian <lcherian@xxxxxxxxxxx>
> > > > > ---
> > > > > Changelog from v9:
> > > > > - Removed READ_CRASHDATA mode meant for special casing crashdata read
> > > > > - Added new fields full, len, offset to struct tmc_resrv_buf
> > > > 
> > > > Why do we need "full" ? See more on that below.
> > > > 
> > > > >     so as to have a common read function for ETR and ETF
> > > > > - Introduced read file operation, tmc_crashdata_read
> > > > >     specific to crashdata reads common for both ETR and ETF
> > > > > - Introduced is_tmc_crashdata_valid function
> > > > >     Special device file /dev/crash_tmc_xxx will be available only when
> > > > >     crashdata is valid.
> > > > > - Version checks added to crashdata validity checks
> > > > > - Mark crashdata as invalid when user starts tracing with ETR sink in
> > > > >     "resrv" buffer mode
> > > > > 
> > > > >    .../hwtracing/coresight/coresight-tmc-core.c  | 206 +++++++++++++++++-
> > > > >    .../hwtracing/coresight/coresight-tmc-etf.c   |  36 +++
> > > > >    .../hwtracing/coresight/coresight-tmc-etr.c   |  63 ++++++
> > > > >    drivers/hwtracing/coresight/coresight-tmc.h   |  18 +-
> > > > >    include/linux/coresight.h                     |  12 +
> > > > >    5 files changed, 333 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/hwtracing/coresight/coresight-tmc-core.c b/drivers/hwtracing/coresight/coresight-tmc-core.c
> > > > > index 54bf8ae2bff8..47b6b3f88750 100644
> > > > > --- a/drivers/hwtracing/coresight/coresight-tmc-core.c
> > > > > +++ b/drivers/hwtracing/coresight/coresight-tmc-core.c
> > > > > @@ -105,6 +105,125 @@ u32 tmc_get_memwidth_mask(struct tmc_drvdata *drvdata)
> > > > >    	return mask;
> > > > >    }
> > > > > +bool is_tmc_crashdata_valid(struct tmc_drvdata *drvdata)
> > > > > +{
> > > > > +	struct tmc_crash_metadata *mdata;
> > > > > +
> > > > > +	if (!tmc_has_reserved_buffer(drvdata) ||
> > > > > +	    !tmc_has_crash_mdata_buffer(drvdata))
> > > > > +		return false;
> > > > > +
> > > > > +	mdata = drvdata->crash_mdata.vaddr;
> > > > > +
> > > > > +	/* Check version match */
> > > > > +	if (mdata->version != CS_CRASHDATA_VERSION)
> > > > > +		return false;
> > > > > +
> > > > > +	/* Check data integrity of metadata */
> > > > > +	if (mdata->crc32_mdata != find_crash_metadata_crc(mdata)) {
> > > > > +		dev_dbg(&drvdata->csdev->dev,
> > > > > +			"CRC mismatch in tmc crash metadata\n");
> > > > > +		return false;
> > > > > +	}
> > > > > +	/* Check data integrity of tracedata */
> > > > > +	if (mdata->crc32_tdata != find_crash_tracedata_crc(drvdata, mdata)) {
> > > > > +		dev_dbg(&drvdata->csdev->dev,
> > > > > +			"CRC mismatch in tmc crash tracedata\n");
> > > > > +		return false;
> > > > > +	}
> > > > > +	/* Check for valid metadata */
> > > > > +	if (!mdata->valid) {
> > > > 
> > > > minor nit: This could be checked right after the VERSION and we verify
> > > > the CRC anyway later and thus could skip all the CRC calculations if
> > > > !valid.
> > > 
> > > 
> > > Ack.
> > > 
> > > > 
> > > > > +		dev_dbg(&drvdata->csdev->dev,
> > > > > +			"Data invalid in tmc crash metadata\n");
> > > > > +		return false;
> > > > > +	}
> > > > > +
> > > > > +	return true;
> > > > > +}
> > > > > +
> > > > > +int tmc_read_prepare_crashdata(struct tmc_drvdata *drvdata)
> > > > > +{
> > > > > +	int ret = 0;
> > > > > +	unsigned long flags;
> > > > > +	struct tmc_crash_metadata *mdata;
> > > > > +	struct coresight_device *csdev = drvdata->csdev;
> > > > > +
> > > > > +	spin_lock_irqsave(&drvdata->spinlock, flags);
> > > > > +
> > > > > +	if (!is_tmc_crashdata_valid(drvdata)) {
> > > > > +		ret = -ENXIO;
> > > > > +		goto out;
> > > > > +	}
> > > > > +
> > > > > +	mdata = drvdata->crash_mdata.vaddr;
> > > > > +	/*
> > > > > +	 * Buffer address given by metadata for retrieval of trace data
> > > > > +	 * from previous boot is expected to be same as the reserved
> > > > > +	 * trace buffer memory region provided through DTS
> > > > > +	 */
> > > > > +	if (drvdata->resrv_buf.paddr != mdata->trace_paddr) {
> > > > > +		dev_dbg(&csdev->dev, "Trace buffer address of previous boot invalid\n");
> > > > 
> > > > Couldn't this be made part of the "is_tmc_crashdata_valid()" and not
> > > > repeated everytime we do the read ? Surely, this can't change after
> > > > boot.
> > > 
> > > Ack. Will move.
> > > 
> > > > 
> > > > > +		ret = -EINVAL;
> > > > > +		goto out;
> > > > > +	}
> > > > > +
> > > > > +	/* Sink specific crashdata mode preparation */
> > > > > +	ret = crashdata_ops(csdev)->prepare(csdev);
> > > > > +	if (ret)
> > > > > +		goto out;
> > > > > +
> > > > > +	if (mdata->sts & 0x1)
> > > > > +		coresight_insert_barrier_packet(drvdata->buf);
> > > > > +
> > > > > +	drvdata->reading = true;
> > > > 
> > > > Why are we dealing with drvdata->reading ? That is supposed to be only
> > > > for the normal trace reading ?
> > > 
> > > Ack. Will remove, we dont need this.
> > > 
> > > > 
> > > > > +out:
> > > > > +	spin_unlock_irqrestore(&drvdata->spinlock, flags);
> > > > > +	return ret;
> > > > > +}
> > > > > +
> > > > > +int tmc_read_unprepare_crashdata(struct tmc_drvdata *drvdata)
> > > > > +{
> > > > > +	int ret;
> > > > > +	unsigned long flags;
> > > > > +	struct coresight_device *csdev = drvdata->csdev;
> > > > > +
> > > > > +	spin_lock_irqsave(&drvdata->spinlock, flags);
> > > > > +
> > > > > +	/* Sink specific crashdata mode preparation */
> > > > > +	ret = crashdata_ops(csdev)->unprepare(csdev);
> > > > > +
> > > > > +	drvdata->reading = false;
> > > > 
> > > > 
> > > > 
> > > > > +	spin_unlock_irqrestore(&drvdata->spinlock, flags);
> > > > > +
> > > > > +	return ret;
> > > > > +}
> > > > > +
> > > > > +static inline ssize_t tmc_get_resvbuf_trace(struct tmc_drvdata *drvdata,
> > > > > +					  loff_t pos, size_t len, char **bufpp)
> > > > > +{
> > > > > +	s64 offset;
> > > > > +	ssize_t actual = len;
> > > > > +	struct tmc_resrv_buf *rbuf = &drvdata->resrv_buf;
> > > > > +
> > > > > +	if (pos + actual > rbuf->len)
> > > > > +		actual = rbuf->len - pos;
> > > > > +	if (actual <= 0)
> > > > > +		return actual;
> > > > 
> > > > return 0 ? Because, we went beyond the file position, not because there was
> > > > an error. So, that it doesn't look like we are suppressing an ERROR ?
> > > 
> > > 
> > > return 0 looks fine to me. Will recheck on this.
> > > 
> > > > 
> > > > 
> > > > > +
> > > > > +	/* Compute the offset from which we read the data */
> > > > > +	offset = rbuf->offset + pos;
> > > > > +	if (offset >= rbuf->size)
> > > > > +		offset -= rbuf->size;
> > > > > +
> > > > > +	/* Adjust the length to limit this transaction to end of buffer */
> > > > > +	actual = (actual < (rbuf->size - offset)) ?
> > > > > +		actual : rbuf->size - offset;
> > > > > +
> > > > > +	*bufpp = (char *)rbuf->vaddr + offset;
> > > > > +
> > > > > +	return actual;
> > > > > +}
> > > > > +
> > > > >    static int tmc_read_prepare(struct tmc_drvdata *drvdata)
> > > > >    {
> > > > >    	int ret = 0;
> > > > > @@ -224,6 +343,70 @@ static const struct file_operations tmc_fops = {
> > > > >    	.llseek		= no_llseek,
> > > > >    };
> > > > > +static int tmc_crashdata_open(struct inode *inode, struct file *file)
> > > > > +{
> > > > > +	int ret;
> > > > > +	struct tmc_drvdata *drvdata = container_of(file->private_data,
> > > > > +						   struct tmc_drvdata,
> > > > > +						   crashdev);
> > > > > +
> > > > > +	ret = tmc_read_prepare_crashdata(drvdata);
> > > > 
> > > > I don't see the point of this "prepare" and unprepare callbacks, as they
> > > > can be made generic by populating the mdata->rrp,rwp fields accordingly ?
> > > > 
> > > > i.e., while populating the mdata-> fields, for ETR, do what you do now.
> > > > For ETF you could :
> > > > 
> > > > mdata->rrp = 0;
> > > > mdata->dba = 0;
> > > > mdata->rwp = drvdata->len;
> > > > mdata->size = drvdata->len >> 2;
> > > > mdata->sts = TMC_STS_FULL;
> > > 
> > > Agree with your point that this would get rid of sink specific
> > > callbacks.
> > > 
> > > But few points to consider before we go with the above approach,
> > > 
> > > * mdata register snapshots wont be true to their definition,
> > >    with such encodings.
> > > 
> > >    We had a similar discussion on this earlier regarding mdata->size,
> > >    ie. We decided to stick to register format instead of storing bytes.
> > >    https://lore.kernel.org/linux-arm-kernel/20240620041054.GC125816@xxxxxxxxxxxxxxxxxxxxxxxxx/
> > 
> > Understood. But whoever fills in the metdata does need to fill the
> > mdata information above ? Including calculating the hash. So, I think it
> > is fair to say that mdata is populated in a way that makes sense
> > just by looking at it. In fact, we should :
> > 
> > 
> > mdata->dba = <address of the tracedata>
> > mdata->rrp = mdata->dba;
> > mdata->rwp = mdata->rrp + drvdata->len;
> > mdata->size = drvdata->len >> 2;
> > mdata->sts = TMC_STS_FULL;
> > 
> > Rather than filling in 0's.

Just wanted to point out that, in case of watchdog reset when firmware
is updating this, dba would be the physical address while in ETR case
it could be virtual address (with IOMMU).
So in that regard, filling in zeros seems better to me for rrp and dba. 

Please let me know your preference on this.

Thanks.