Re: [RFT v3 1/4] perf cs-etm: Generate branch sample for missed packets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Leo,



On 30 May 2018 at 10:45, Robert Walker <robert.walker@xxxxxxx> wrote:
>
>
> On 28/05/18 23:13, Mathieu Poirier wrote:
>>
>> Leo and/or Robert,
>>
>> On Mon, May 28, 2018 at 04:45:00PM +0800, Leo Yan wrote:
>>>
>>> Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
>>> traces") reworks the samples generation flow from CoreSight trace to
>>> match the correct format so Perf report tool can display the samples
>>> properly.
>>>
>>> But the change has side effect for branch packet handling, it only
>>> generate branch samples by checking previous packet flag
>>> 'last_instr_taken_branch' is true, this results in below three kinds
>>> packets are missed to generate branch samples:
>>>
>>> - The start tracing packet at the beginning of tracing data;
>>> - The exception handling packet;
>>> - If one CS_ETM_TRACE_ON packet is inserted, we also miss to handle it
>>>    for branch samples.  CS_ETM_TRACE_ON packet itself can give the info
>>>    that there have a discontinuity in the trace, on the other hand we
>>>    also miss to generate proper branch sample for packets before and
>>>    after CS_ETM_TRACE_ON packet.
>>>
>>> This patch is to add branch sample handling for up three kinds packets:
>>>
>>> - In function cs_etm__sample(), check if 'prev_packet->sample_type' is
>>>    zero and in this case it generates branch sample for the start tracing
>>>    packet; furthermore, we also need to handle the condition for
>>>    prev_packet::end_addr is zero in the cs_etm__last_executed_instr();
>>>
>>> - In function cs_etm__sample(), check if 'prev_packet->exc' is true and
>>>    generate branch sample for exception handling packet;
>>>
>>> - If there has one CS_ETM_TRACE_ON packet is coming, we firstly generate
>>>    branch sample in the function cs_etm__flush(), this can save complete
>>>    info for the previous CS_ETM_RANGE packet just before CS_ETM_TRACE_ON
>>>    packet.  We also generate branch sample for the new CS_ETM_RANGE
>>>    packet after CS_ETM_TRACE_ON packet, this have two purposes, the
>>>    first one purpose is to save the info for the new CS_ETM_RANGE packet,
>>>    the second purpose is to save CS_ETM_TRACE_ON packet info so we can
>>>    have hint for a discontinuity in the trace.
>>>
>>>    For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and
>>>    'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in
>>>    the decoder layer as dummy value.  This patch is to convert these
>>>    values to zeros for more readable; this is accomplished by functions
>>>    cs_etm__last_executed_instr() and cs_etm__first_executed_instr().  The
>>>    later one is a new function introduced by this patch.
>>>
>>> Reviewed-by: Robert Walker <robert.walker@xxxxxxx>
>>> Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
>>> traces")
>>> Signed-off-by: Leo Yan <leo.yan@xxxxxxxxxx>
>>> ---
>>>   tools/perf/util/cs-etm.c | 93
>>> +++++++++++++++++++++++++++++++++++++-----------
>>>   1 file changed, 73 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>>> index 822ba91..8418173 100644
>>> --- a/tools/perf/util/cs-etm.c
>>> +++ b/tools/perf/util/cs-etm.c
>>> @@ -495,6 +495,20 @@ static inline void
>>> cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
>>>   static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet
>>> *packet)
>>>   {
>>>         /*
>>> +        * The packet is the start tracing packet if the end_addr is
>>> zero,
>>> +        * returns 0 for this case.
>>> +        */
>>> +       if (!packet->end_addr)
>>> +               return 0;
>>
>>
>> What is considered to be the "start tracing packet"?  Right now the only
>> two
>> kind of packets inserted in the decoder packet buffer queue are INST_RANGE
>> and
>> TRACE_ON.  How can we hit a condition where packet->end-addr == 0?
>>
>>
>>> +
>>> +       /*
>>> +        * The packet is the CS_ETM_TRACE_ON packet if the end_addr is
>>> +        * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>>> +        */
>>> +       if (packet->end_addr == 0xdeadbeefdeadbeefUL)
>>> +               return 0;
>>
>>
>> As it is with the above, I find triggering on addresses to be brittle and
>> hard
>> to maintain on the long run.  Packets all have a sample_type field that
>> should
>> be used in cases like this one.  That way we know exactly the condition
>> that is
>> targeted.
>>
>> While working on this set, please spin-off another patch that defines
>> CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL and replace all the cases where the
>> numeral is used.  That way we stop using the hard coded value.
>>
>>> +
>>> +       /*
>>>          * The packet records the execution range with an exclusive end
>>> address
>>>          *
>>>          * A64 instructions are constant size, so the last executed
>>> @@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct
>>> cs_etm_packet *packet)
>>>         return packet->end_addr - A64_INSTR_SIZE;
>>>   }
>>>   +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet
>>> *packet)
>>> +{
>>> +       /*
>>> +        * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
>>> +        * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>>> +        */
>>> +       if (packet->start_addr == 0xdeadbeefdeadbeefUL)
>>> +               return 0;
>>
>>
>> Same comment as above.
>>
>>> +
>>> +       return packet->start_addr;
>>> +}
>>> +
>>>   static inline u64 cs_etm__instr_count(const struct cs_etm_packet
>>> *packet)
>>>   {
>>>         /*
>>> @@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct
>>> cs_etm_queue *etmq)
>>>         be       = &bs->entries[etmq->last_branch_pos];
>>>         be->from = cs_etm__last_executed_instr(etmq->prev_packet);
>>> -       be->to   = etmq->packet->start_addr;
>>> +       be->to   = cs_etm__first_executed_instr(etmq->packet);
>>>         /* No support for mispredict */
>>>         be->flags.mispred = 0;
>>>         be->flags.predicted = 1;
>>> @@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct
>>> cs_etm_queue *etmq)
>>>         sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
>>>         sample.pid = etmq->pid;
>>>         sample.tid = etmq->tid;
>>> -       sample.addr = etmq->packet->start_addr;
>>> +       sample.addr = cs_etm__first_executed_instr(etmq->packet);
>>>         sample.id = etmq->etm->branches_id;
>>>         sample.stream_id = etmq->etm->branches_id;
>>>         sample.period = 1;
>>> @@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue
>>> *etmq)
>>>                 etmq->period_instructions = instrs_over;
>>>         }
>>>   -     if (etm->sample_branches &&
>>> -           etmq->prev_packet &&
>>> -           etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>>> -           etmq->prev_packet->last_instr_taken_branch) {
>>> -               ret = cs_etm__synth_branch_sample(etmq);
>>> -               if (ret)
>>> -                       return ret;
>>> +       if (etm->sample_branches && etmq->prev_packet) {
>>> +               bool generate_sample = false;
>>> +
>>> +               /* Generate sample for start tracing packet */
>>> +               if (etmq->prev_packet->sample_type == 0 ||
>>
>>
>> What kind of packet is sample_type == 0 ?
>>
>>> +                   etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
>>> +                       generate_sample = true;
>>> +
>>> +               /* Generate sample for exception packet */
>>> +               if (etmq->prev_packet->exc == true)
>>> +                       generate_sample = true;
>>
>>
>> Please don't do that.  Exception packets have a type of their own and can
>> be
>> added to the decoder packet queue the same way INST_RANGE and TRACE_ON
>> packets
>> are.  Moreover exception packet containt an address that, if I'm reading
>> the
>> documenation properly, can be used to keep track of instructions that were
>> executed between the last address of the previous range packet and the
>> address
>> executed just before the exception occurred.  Mike and Rob will have to
>> confirm
>> this as the decoder may be doing all that hard work for us.
>>

clarification on the exception packets....

The Opencsd output exception packet gives you the exception number,
and optionally the preferred return address. If this address is
present does depend a lot on the underlying protocol - will normally
be there with ETMv4.
Exceptions are marked differently in the underlying protocol - the
OCSD packets abstract away these differences.

consider the code:

0x1000: <some instructions>
0x1100: BR 0x2000
....
0x2000: <some instructions>
0x2020  BZ r4

Without an exception this would result in the packets

OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken)   // recall that
range packets have start addr inclusive, end addr exclusive.
OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
depends on condition>

Now consider an exception occurring before the BR 0x2000

this will result in:-
OCSD_RANGE(0x1000, 0x1100, Last instr type=Other)
OCSD_EXECEPTION(IRQ, ret-addr 0x1100)
OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken)    //
this is more likely to have multiple ranges / branches before any
return, but simplified here.
OCSD_EXCEPTION_RETURN()  // present if exception returns are
explicitly marked in underlying trace - may not always be depending on
circumstances.
OCSD_RANGE(0x1100,0x1104, Last=BR, taken) // continue on with short
range - just the branch
OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
depends on condition>

Now consider the exception occurring after the BR, but before any
other instructions are executed.

OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken)   // recall that
range packets have start addr inclusive, end addr exclusive.
OCSD_EXECEPTION(IRQ, ret-addr 0x2000)     // here the preferred return
address is actually the target of the branch.
OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken)    //
this is more likely to have multiple ranges / branches before any
return, but simplified here.
OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
depends on condition>

So in general it is possible to arrive in the IRQ_START range with the
previous packet having been either a taken branch, a not taken branch,
or not a branch.
Care must be taken - whether AutoFDO or normal trace disassembly not
to assume that having the last range packet as a taken branch means
that the next range packet is the target, if there is an intervening
exception packet.

Regards

Mike

>>> +
>>> +               /* Generate sample for normal branch packet */
>>> +               if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>>> +                   etmq->prev_packet->last_instr_taken_branch)
>>> +                       generate_sample = true;
>>> +
>>> +               if (generate_sample) {
>>> +                       ret = cs_etm__synth_branch_sample(etmq);
>>> +                       if (ret)
>>> +                               return ret;
>>> +               }
>>>         }
>>>         if (etm->sample_branches || etm->synth_opts.last_branch) {
>>> @@ -922,11 +963,16 @@ static int cs_etm__sample(struct cs_etm_queue
>>> *etmq)
>>>   static int cs_etm__flush(struct cs_etm_queue *etmq)
>>>   {
>>>         int err = 0;
>>> +       struct cs_etm_auxtrace *etm = etmq->etm;
>>>         struct cs_etm_packet *tmp;
>>>   -     if (etmq->etm->synth_opts.last_branch &&
>>> -           etmq->prev_packet &&
>>> -           etmq->prev_packet->sample_type == CS_ETM_RANGE) {
>>> +       if (!etmq->prev_packet)
>>> +               return 0;
>>> +
>>> +       if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
>>> +               return 0;
>>> +
>>> +       if (etmq->etm->synth_opts.last_branch) {
>>
>>
>> If you add:
>>
>>          if (!etmq->etm->synth_opts.last_branch)
>>                  return 0;
>>
>> You can avoid indenting the whole block.
>>
>>>                 /*
>>>                  * Generate a last branch event for the branches left in
>>> the
>>>                  * circular buffer at the end of the trace.
>>> @@ -939,18 +985,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
>>>                 err = cs_etm__synth_instruction_sample(
>>>                         etmq, addr,
>>>                         etmq->period_instructions);
>>> +               if (err)
>>> +                       return err;
>>>                 etmq->period_instructions = 0;
>>> +       }
>>>   -             /*
>>> -                * Swap PACKET with PREV_PACKET: PACKET becomes
>>> PREV_PACKET for
>>> -                * the next incoming packet.
>>> -                */
>>> -               tmp = etmq->packet;
>>> -               etmq->packet = etmq->prev_packet;
>>> -               etmq->prev_packet = tmp;
>>> +       if (etm->sample_branches) {
>>> +               err = cs_etm__synth_branch_sample(etmq);
>>> +               if (err)
>>> +                       return err;
>>>         }
>>>   -     return err;
>>> +       /*
>>> +        * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
>>> +        * the next incoming packet.
>>> +        */
>>> +       tmp = etmq->packet;
>>> +       etmq->packet = etmq->prev_packet;
>>> +       etmq->prev_packet = tmp;
>>
>>
>> Robert, I remember noticing that when you first submitted the code but
>> forgot to
>> go back to it.  What is the point of swapping the packets?  I understand
>>
>> etmq->prev_packet = etmq->packet;
>>
>> But not
>>
>> etmq->packet = tmp;
>>
>> After all etmq->packet will be clobbered as soon as
>> cs_etm_decoder__get_packet()
>> is called, which is alwasy right after either cs_etm__sample() or
>> cs_etm__flush().
>>
>
> This is code I inherited from the original versions of these patches, but it
> works because:
> - etmq->packet and etmq->prev_packet are pointers to struct cs_etm_packet
> allocated by zalloc() in cs_etm__alloc_queue()
> - cs_etm_decoder__get_packet() takes a pointer to struct cs_etm_packet and
> copies the contents of the first packet from the queue into the passed
> location with:
>    *packet = decoder->packet_buffer[decoder->head]
>
> So the swap code is only swapping the pointers over, not the contents of the
> packets.
>
> Regards
>
> Rob
>
>
>
>> Thanks,
>> Mathieu
>>
>>
>>
>>> +       return 0;
>>>   }
>>>     static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
>>> --
>>> 2.7.4
>>>
>



-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux