On Fri, 1 Jun 2012, Alex Elder wrote: > On 06/01/2012 11:20 AM, Sage Weil wrote: > > The problem is that socket events queue work, which can take a while, and > > race with, say, osd_client getting an osdmap and dropping it's > > struct ceph_osd. The ->get and ->put ops just twiddle the containing > > struct's refcount, in that case, so the con_work will find the (now > > closed) ceph_connection and do nothing... > > I think you're saying that the connection (or its socket) needs to > be protected from its containing structure going away. So the > connection needs to hold a reference to its container. If that's > the case then the disposal of the ceph_osd needs to clean up > the connection fully before it goes away. Yeah. I think it happens already before we drop the ref: static void __remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd) { dout("__remove_osd %p\n", osd); BUG_ON(!list_empty(&osd->o_requests)); rb_erase(&osd->o_node, &osdc->osds); list_del_init(&osd->o_osd_lru); ceph_con_close(&osd->o_con); put_osd(osd); } So it's just the con reference in the workqueue that matters. sage > > Anyway, I think I see why there might be a need for the ref counts > and they obviously won't go away if they're needed... > > -Alex > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html