On Thu, Feb 3, 2022 at 6:46 PM Chaitanya Kulkarni <chaitanyak@xxxxxxxxxx> wrote: > > Yang, > > On 2/3/22 12:12, Yang Shi wrote: > > Currently, rasdaemon uses the existing tracepoint block_rq_complete > > and filters out non-error cases in order to capture block disk errors. > > > > But there are a few problems with this approach: > > > > 1. Even kernel trace filter could do the filtering work, there is > > still some overhead after we enable this tracepoint. > > > > 2. The filter is merely based on errno, which does not align with kernel > > logic to check the errors for print_req_error(). > > > > 3. block_rq_complete only provides dev major and minor to identify > > the block device, it is not convenient to use in user-space. > > > > So introduce a new tracepoint block_rq_error just for the error case. > > With this patch, rasdaemon could switch to block_rq_error. > > > > This patch looks good, but I've a question for you. > > We already have a tracepoint for the request completion > block_rq_complete(). We are adding a new tracepoint blk_rq_error() > that is also similar to what blk_rq_complete() reports. > Similar call sites :- > trace_block_rq_complete(req, error, nr_bytes); > trace_block_rq_error(req, error, nr_bytes); > > The only delta between blk_rq_complete() and blk_rq_error() is > cmd field for blk_rq_complete() in the TP_STRUCT_ENTRY() and > __get_str(cmd) field in TP_printk() which I don't think will > have any issue if we use that for blk_rq_error(). Yes, I agree. Just no user needs it for our usecase. > > Question 1 :- What prevents us from using the same format for > both blk_rq_complete() and blk_rq_error() ? Actually nothing if we ignore cmd. > > Question 2 :- assuming that blk_rq_complete() and blk_rq_error() > are using same format why can't we :- > > declare DECLARE_EVENT_CLASS(blk_rq_completion....) > and use that class for blk_rq_complete() and blk_rq_error() ? > > since if I remember correctly we need to define a event class > instead of duplicating a tracepoint with similar reporting. Very good point. I did overlook it. The original post did have disk name and didn't have cmd, now the two tracepoints look much more similar than the original post, so I agree the duplicate could be combined into an event class. > > -ck > >