Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/20/15 15:40, Alex Gartrell wrote:
We have an application that invokes tc to delete the root every time the
config changes. As a result we stress the cleanup code and were seeing the
following panic:

   crash> bt
   PID: 630839  TASK: ffff8823c990d280  CPU: 14  COMMAND: "tc"
    [... snip ...]
    #8 [ffff8820ceec17a0] page_fault at ffffffff8160a8c2
       [exception RIP: htb_qlen_notify+24]
       RIP: ffffffffa0841718  RSP: ffff8820ceec1858  RFLAGS: 00010282
       RAX: 0000000000000000  RBX: 0000000000000000  RCX: ffff88241747b400
       RDX: ffff88241747b408  RSI: 0000000000000000  RDI: ffff8811fb27d000
       RBP: ffff8820ceec1868   R8: ffff88120cdeff24   R9: ffff88120cdeff30
       R10: 0000000000000bd4  R11: ffffffffa0840919  R12: ffffffffa0843340
       R13: 0000000000000000  R14: 0000000000000001  R15: ffff8808dae5c2e8
       ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    #9 [...] qdisc_tree_decrease_qlen at ffffffff81565375
   #10 [...] fq_codel_dequeue at ffffffffa084e0a0 [sch_fq_codel]
   #11 [...] fq_codel_reset at ffffffffa084e2f8 [sch_fq_codel]
   #12 [...] qdisc_destroy at ffffffff81560d2d
   #13 [...] htb_destroy_class at ffffffffa08408f8 [sch_htb]
   #14 [...] htb_put at ffffffffa084095c [sch_htb]
   #15 [...] tc_ctl_tclass at ffffffff815645a3
   #16 [...] rtnetlink_rcv_msg at ffffffff81552cb0
   [... snip ...]

To my understanding, the following situation is taking place.


   tc_ctl_tclass

    -> htb_delete
      -> class is deleted from clhash
    -> htb_put
      -> qdisc_destroy
        -> fq_codel_reset

=========> this part looks suspicious. Why is reset invoking
a dequeue? Shouldnt a destroy just purge the queue?

          -> fq_codel_dequeue
            -> qdidsc_tree_decrease_qlen
              -> cl = htb_get # returns NULL, removed in htb_delete
                -> htb_qlen_notify(sch, NULL) # BOOM


It is worrisome to fix the core code for this. The root cause seems to
be codel. Dont have time but in general, reset would be something like:

struct fq_codel_sched_data *q = qdisc_priv(sch);
qdisc_reset(q)

or something along those lines...
But certainly dequeue semantics dont seem right there..

cheers,
jamal



cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]