Re: [PATCH v10 5/8] coresight: tmc: Add support for reading crash data

Linu Cherian <lcherian@xxxxxxxxxxx> · Mon, 21 Oct 2024 18:10:40 +0530

Hi Suzuki,

On 2024-10-18 at 15:16:17, Suzuki K Poulose (suzuki.poulose@xxxxxxx) wrote:
> On 17/10/2024 12:40, Linu Cherian wrote:
> > On 2024-10-03 at 18:55:54, Suzuki K Poulose (suzuki.poulose@xxxxxxx) wrote:
> > > Hi Linu
> > > 
> > > On 16/09/2024 11:34, Linu Cherian wrote:
> > > > * Add support for reading crashdata using special device files.
> > > >     The special device files /dev/crash_tmc_xxx would be available
> > > >     for read file operation only when the crash data is valid.
> > > > 
> > > > * User can read the crash data as below
> > > > 
> > > >     For example, for reading crash data from tmc_etf sink
> > > > 
> > > >     #dd if=/dev/crash_tmc_etfXX of=~/cstrace.bin
> > > 
> > > There are some comments below, please take a look.
> > > 
> > > > 
> > > > Signed-off-by: Anil Kumar Reddy <areddy3@xxxxxxxxxxx>
> > > > Signed-off-by: Tanmay Jagdale <tanmay@xxxxxxxxxxx>
> > > > Signed-off-by: Linu Cherian <lcherian@xxxxxxxxxxx>
> > > > ---
> > > > Changelog from v9:
> > > > - Removed READ_CRASHDATA mode meant for special casing crashdata read
> > > > - Added new fields full, len, offset to struct tmc_resrv_buf
> > > 
> > > Why do we need "full" ? See more on that below.
> > > 
> > > >     so as to have a common read function for ETR and ETF
> > > > - Introduced read file operation, tmc_crashdata_read
> > > >     specific to crashdata reads common for both ETR and ETF
> > > > - Introduced is_tmc_crashdata_valid function
> > > >     Special device file /dev/crash_tmc_xxx will be available only when
> > > >     crashdata is valid.
> > > > - Version checks added to crashdata validity checks
> > > > - Mark crashdata as invalid when user starts tracing with ETR sink in
> > > >     "resrv" buffer mode
> > > > 
> > > >    .../hwtracing/coresight/coresight-tmc-core.c  | 206 +++++++++++++++++-
> > > >    .../hwtracing/coresight/coresight-tmc-etf.c   |  36 +++
> > > >    .../hwtracing/coresight/coresight-tmc-etr.c   |  63 ++++++
> > > >    drivers/hwtracing/coresight/coresight-tmc.h   |  18 +-
> > > >    include/linux/coresight.h                     |  12 +
> > > >    5 files changed, 333 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/hwtracing/coresight/coresight-tmc-core.c b/drivers/hwtracing/coresight/coresight-tmc-core.c
> > > > index 54bf8ae2bff8..47b6b3f88750 100644
> > > > --- a/drivers/hwtracing/coresight/coresight-tmc-core.c
> > > > +++ b/drivers/hwtracing/coresight/coresight-tmc-core.c
> > > > @@ -105,6 +105,125 @@ u32 tmc_get_memwidth_mask(struct tmc_drvdata *drvdata)
> > > >    	return mask;
> > > >    }
> > > > +bool is_tmc_crashdata_valid(struct tmc_drvdata *drvdata)
> > > > +{
> > > > +	struct tmc_crash_metadata *mdata;
> > > > +
> > > > +	if (!tmc_has_reserved_buffer(drvdata) ||
> > > > +	    !tmc_has_crash_mdata_buffer(drvdata))
> > > > +		return false;
> > > > +
> > > > +	mdata = drvdata->crash_mdata.vaddr;
> > > > +
> > > > +	/* Check version match */
> > > > +	if (mdata->version != CS_CRASHDATA_VERSION)
> > > > +		return false;
> > > > +
> > > > +	/* Check data integrity of metadata */
> > > > +	if (mdata->crc32_mdata != find_crash_metadata_crc(mdata)) {
> > > > +		dev_dbg(&drvdata->csdev->dev,
> > > > +			"CRC mismatch in tmc crash metadata\n");
> > > > +		return false;
> > > > +	}
> > > > +	/* Check data integrity of tracedata */
> > > > +	if (mdata->crc32_tdata != find_crash_tracedata_crc(drvdata, mdata)) {
> > > > +		dev_dbg(&drvdata->csdev->dev,
> > > > +			"CRC mismatch in tmc crash tracedata\n");
> > > > +		return false;
> > > > +	}
> > > > +	/* Check for valid metadata */
> > > > +	if (!mdata->valid) {
> > > 
> > > minor nit: This could be checked right after the VERSION and we verify
> > > the CRC anyway later and thus could skip all the CRC calculations if
> > > !valid.
> > 
> > 
> > Ack.
> > 
> > > 
> > > > +		dev_dbg(&drvdata->csdev->dev,
> > > > +			"Data invalid in tmc crash metadata\n");
> > > > +		return false;
> > > > +	}
> > > > +
> > > > +	return true;
> > > > +}
> > > > +
> > > > +int tmc_read_prepare_crashdata(struct tmc_drvdata *drvdata)
> > > > +{
> > > > +	int ret = 0;
> > > > +	unsigned long flags;
> > > > +	struct tmc_crash_metadata *mdata;
> > > > +	struct coresight_device *csdev = drvdata->csdev;
> > > > +
> > > > +	spin_lock_irqsave(&drvdata->spinlock, flags);
> > > > +
> > > > +	if (!is_tmc_crashdata_valid(drvdata)) {
> > > > +		ret = -ENXIO;
> > > > +		goto out;
> > > > +	}
> > > > +
> > > > +	mdata = drvdata->crash_mdata.vaddr;
> > > > +	/*
> > > > +	 * Buffer address given by metadata for retrieval of trace data
> > > > +	 * from previous boot is expected to be same as the reserved
> > > > +	 * trace buffer memory region provided through DTS
> > > > +	 */
> > > > +	if (drvdata->resrv_buf.paddr != mdata->trace_paddr) {
> > > > +		dev_dbg(&csdev->dev, "Trace buffer address of previous boot invalid\n");
> > > 
> > > Couldn't this be made part of the "is_tmc_crashdata_valid()" and not
> > > repeated everytime we do the read ? Surely, this can't change after
> > > boot.
> > 
> > Ack. Will move.
> > 
> > > 
> > > > +		ret = -EINVAL;
> > > > +		goto out;
> > > > +	}
> > > > +
> > > > +	/* Sink specific crashdata mode preparation */
> > > > +	ret = crashdata_ops(csdev)->prepare(csdev);
> > > > +	if (ret)
> > > > +		goto out;
> > > > +
> > > > +	if (mdata->sts & 0x1)
> > > > +		coresight_insert_barrier_packet(drvdata->buf);
> > > > +
> > > > +	drvdata->reading = true;
> > > 
> > > Why are we dealing with drvdata->reading ? That is supposed to be only
> > > for the normal trace reading ?
> > 
> > Ack. Will remove, we dont need this.
> > 
> > > 
> > > > +out:
> > > > +	spin_unlock_irqrestore(&drvdata->spinlock, flags);
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +int tmc_read_unprepare_crashdata(struct tmc_drvdata *drvdata)
> > > > +{
> > > > +	int ret;
> > > > +	unsigned long flags;
> > > > +	struct coresight_device *csdev = drvdata->csdev;
> > > > +
> > > > +	spin_lock_irqsave(&drvdata->spinlock, flags);
> > > > +
> > > > +	/* Sink specific crashdata mode preparation */
> > > > +	ret = crashdata_ops(csdev)->unprepare(csdev);
> > > > +
> > > > +	drvdata->reading = false;
> > > 
> > > 
> > > 
> > > > +	spin_unlock_irqrestore(&drvdata->spinlock, flags);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +static inline ssize_t tmc_get_resvbuf_trace(struct tmc_drvdata *drvdata,
> > > > +					  loff_t pos, size_t len, char **bufpp)
> > > > +{
> > > > +	s64 offset;
> > > > +	ssize_t actual = len;
> > > > +	struct tmc_resrv_buf *rbuf = &drvdata->resrv_buf;
> > > > +
> > > > +	if (pos + actual > rbuf->len)
> > > > +		actual = rbuf->len - pos;
> > > > +	if (actual <= 0)
> > > > +		return actual;
> > > 
> > > return 0 ? Because, we went beyond the file position, not because there was
> > > an error. So, that it doesn't look like we are suppressing an ERROR ?
> > 
> > 
> > return 0 looks fine to me. Will recheck on this.
> > 
> > > 
> > > 
> > > > +
> > > > +	/* Compute the offset from which we read the data */
> > > > +	offset = rbuf->offset + pos;
> > > > +	if (offset >= rbuf->size)
> > > > +		offset -= rbuf->size;
> > > > +
> > > > +	/* Adjust the length to limit this transaction to end of buffer */
> > > > +	actual = (actual < (rbuf->size - offset)) ?
> > > > +		actual : rbuf->size - offset;
> > > > +
> > > > +	*bufpp = (char *)rbuf->vaddr + offset;
> > > > +
> > > > +	return actual;
> > > > +}
> > > > +
> > > >    static int tmc_read_prepare(struct tmc_drvdata *drvdata)
> > > >    {
> > > >    	int ret = 0;
> > > > @@ -224,6 +343,70 @@ static const struct file_operations tmc_fops = {
> > > >    	.llseek		= no_llseek,
> > > >    };
> > > > +static int tmc_crashdata_open(struct inode *inode, struct file *file)
> > > > +{
> > > > +	int ret;
> > > > +	struct tmc_drvdata *drvdata = container_of(file->private_data,
> > > > +						   struct tmc_drvdata,
> > > > +						   crashdev);
> > > > +
> > > > +	ret = tmc_read_prepare_crashdata(drvdata);
> > > 
> > > I don't see the point of this "prepare" and unprepare callbacks, as they
> > > can be made generic by populating the mdata->rrp,rwp fields accordingly ?
> > > 
> > > i.e., while populating the mdata-> fields, for ETR, do what you do now.
> > > For ETF you could :
> > > 
> > > mdata->rrp = 0;
> > > mdata->dba = 0;
> > > mdata->rwp = drvdata->len;
> > > mdata->size = drvdata->len >> 2;
> > > mdata->sts = TMC_STS_FULL;
> > 
> > Agree with your point that this would get rid of sink specific
> > callbacks.
> > 
> > But few points to consider before we go with the above approach,
> > 
> > * mdata register snapshots wont be true to their definition,
> >    with such encodings.
> > 
> >    We had a similar discussion on this earlier regarding mdata->size,
> >    ie. We decided to stick to register format instead of storing bytes.
> >    https://lore.kernel.org/linux-arm-kernel/20240620041054.GC125816@xxxxxxxxxxxxxxxxxxxxxxxxx/
> 
> Understood. But whoever fills in the metdata does need to fill the
> mdata information above ? Including calculating the hash. So, I think it
> is fair to say that mdata is populated in a way that makes sense
> just by looking at it. In fact, we should :
> 
> 
> mdata->dba = <address of the tracedata>
> mdata->rrp = mdata->dba;
> mdata->rwp = mdata->rrp + drvdata->len;
> mdata->size = drvdata->len >> 2;
> mdata->sts = TMC_STS_FULL;
> 
> Rather than filling in 0's.

Okay, will make this change for ETF and avoid the sink specific
callbacks.

> 
> 
> > 
> > * We will have more comparable code between normal trace buffer reading
> >    and reading trace buffer after panic with sink specific callbacks,
> >    even though we keep the code seperate.
> 
> Lets not create "customs" here. We have a callback for a reason. i.e,
> the ETF has an internal SRAM which is not accessible directly by the CPU
> and ETR uses normal RAM.
> 
> While in this case, we have the data in CPU accessible RAM and the
> "driver" in this case is crash handling and can be agnostic of where
> the data was captured (ETF vs ETR)

Okay fine.

> 
> 
> Suzuki
> 
> 
> > 
> > Please let me know your thoughts on this.
> 
> 
>