On Sun, Mar 18, 2012 at 1:23 PM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > On Sat, 17 Mar 2012, Ted Ts'o wrote: > >> I can't help thinking that the fact that we're constantly playing >> whack-a-mole trying to fix various random crashes when devices >> disappear that perhaps we should consider if there's a better way to >> do things. > > Indeed, as Jens's patch mentions, proper reference counting for the BDI > stuff hasn't been implemented yet. Obviously it will require somebody > who really does know the code (i.e., not me). > > For example, when Paul's patch assigns &default_backing_dev_info, is > the assignment synchronized by any sort of lock? I can't tell -- but > if it isn't then the possibility of a race will still exist. > I think its safe without a lock (assuming the assignment is atomic) but it wouldn't hurt to add an i_lock. That would also give you a barrier which is needed to propagate the assignment to other CPUs. This is not a perfect fix but its pretty safe and is nice in that it works independent of filesystem or bus-type. Regards, Mandeep >> The fact that at the file system layer I have **no** idea that a >> device has disappeared, and just blindly going on trying to write to a >> device which is gone just seems a little crazy to me... why shouldn't >> block layer inform the upper layers about something as fundamental as, >> "the device is gone and is never coming back"? > > Playing devil's advocate... What would you do differently if you did > know the device was gone? All I/O operations will fail regardless, and > presumably with an error code like -ENODEV. Pretty much all you could > do would be to fail them a little earlier. > >> > I suspect Paul's patch is the right thing to do. It might even make >> > the ext4 fix unnecessary, although I don't understand the details well >> > enough to verify it. Maybe Paul can check -- the commit I'm referring >> > to is 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific >> > kludge to avoid an oops after the disk disappears). >> >> I have no idea either, because it's not obvious to me what data >> structures can be relied upon, and what can't, and when things are >> supposed to get freed on sudden device disconnects. The fact that >> none of us are sure is part of what makes me think that the current >> scheme is, perhaps, non-optimal... > > That's why someone like Jens or Al needs to take a close look at this > (hint, hint). > > Alan Stern > -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html