Re: [v5 PATCH] block: introduce block_rq_error tracepoint

Yang Shi <shy828301@xxxxxxxxx> · Wed, 2 Feb 2022 09:47:42 -0800

On Thu, Jan 27, 2022 at 10:18 AM Yang Shi <shy828301@xxxxxxxxx> wrote:
>
> On Thu, Jan 27, 2022 at 1:53 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, Jan 26, 2022 at 10:51:53AM -0800, Yang Shi wrote:
> > > +             __entry->dev       = rq->q->disk ? disk_devt(rq->q->disk) : 0;
> > > +             __assign_str(name,   rq->q->disk ? rq->q->disk->disk_name : "?");
> >
> > None f the other tracepoints has the disk name, why does this one need
> > it?  And if you add it please avoid the overly long line.
>
> I guess the disk name was added to ease some handling in userspace
> tools. But if all other tracepoints don't have disk name shown, I
> think I'd better follow the convention. I did overlook this when I
> ported this patch. Will remove it.
>
> >
> > > +             __entry->sector    = blk_rq_pos(rq);
> > > +             __entry->nr_sector = nr_bytes >> 9;
> > > +             __entry->error     = blk_status_to_errno(error);
> >
> > This still converts the block status to an errno.
>
> I may misunderstand your comments. I just followed what
> block_rq_complete tracepoint does. Or the linux-block community is
> converting all tracepoints to show blk status code instead of
> conventional errno?
>
> And the userspace tool doesn't know blk status code and still expects
> the conventional errno. For example, rasdaemon reads block_rq_complete
> events now and have the below:
>
> static const struct {
>         int             error;
>         const char      *name;
> } blk_errors[] = {
>         { -EOPNOTSUPP, "operation not supported error" },
>         { -ETIMEDOUT, "timeout error" },
>         { -ENOSPC,    "critical space allocation error" },
>         { -ENOLINK,   "recoverable transport error" },
>         { -EREMOTEIO, "critical target error" },
>         { -EBADE,     "critical nexus error" },
>         { -ENODATA,   "critical medium error" },
>         { -EILSEQ,    "protection error" },
>         { -ENOMEM,    "kernel resource error" },
>         { -EBUSY,     "device resource error" },
>         { -EAGAIN,    "nonblocking retry error" },
>         { -EREMCHG, "dm internal retry error" },
>         { -EIO,       "I/O error" },
> };

Gently ping. Should I print blk status code instead of errno for this
trace point? But I really don't get why. And block_rq_complete
tracepoint does:

        TP_fast_assign(
                __entry->dev       = rq->q->disk ? disk_devt(rq->q->disk) : 0;
                __entry->sector    = blk_rq_pos(rq);
                __entry->nr_sector = nr_bytes >> 9;
                __entry->error     = blk_status_to_errno(error); <===
convert blk status code to errno

                blk_fill_rwbs(__entry->rwbs, rq->cmd_flags);
                __get_str(cmd)[0] = '\0';
        ),

>
> This patch aims to add block_rq_err tracepoint to replace
> block_rq_complete in rasdaemon.