On Fri, Feb 19, 2016 at 03:51:47PM -0500, Tejun Heo wrote: > I see, I suppose that's what distinguishes s_active and s_umount > usages - whether pinning should block umounting? ??? ->s_active is plain and simple count of "I hold a long-term reference to this superblock, don't you shut it down until I drop that". ->s_umount is held across some of the transitions in struct super_block life cycle, including the actual process of shutdown. > > If you need details on s_active/s_umount/etc., I can give you a braindump, > > but I suspect your real question is a lot more specific. Details, please... > > So, the problem is that cgroup writeback path sometimes schedules a > work item to change the cgroup an inode is associated. Currently, > only the inode was pinned and the underlying sb may go away while the > work item is still pending. The work item performs iput() at the end > and that explodes if the underlying sb is already gone. > > As writeback path relies on s_umount for synchronization anyway, I > think that'd be the most natural way to hold onto the sb but > unfortunately there's no way to pass on the down_read to the async > execution context, so I made it grap s_active, which worked fine but > it made the sb hang around until such work items are finished. It's > an unlikely race to hit but still broken. > > The last option would be canceling / flushing these work items from sb > shutdown path which is likely more involved. > > What should it be doing? Um... What ordering requirements do you have? You obviously shouldn't let it continue past the shutdown - as the matter of fact, you *can't* let it continue past generic_shutdown_super(), since any inode references held at evict_inodes() time will make it very unhappy. Attempts to do any IO after that will make things a lot worse than unhappy - data structures needed to do it might be gone (and if you hold a bit longer, filesystem driver itself might very well be gone, along with the functions you were going to call). Grabbing ->s_active is a seriously bad idea for another reason - in a situation when there's only one mount of given fs, plain umount() should _not_ return 0 before fs shutdown is over. Sure, it is possible that there's a binding somewhere, or that it's a lazy umount, etc., but those are "you've asked for it" situations; having plain umount of e.g. ext3 on a USB stick return success before it is safe to pull that stick out is a Bloody Bad Idea, for obvious usability reasons. IOW, while fs shutdown may be async, making it *always* async would be a bad bug. And bumping ->s_active does just that. I'd go for trylock inside that work + making generic_shutdown_super() kill all such works. I assume that it *can* be abandoned in situation when we know that sync_filesystem() is about to be called and that said sync_filesystem() won't, in turn, schedule any such works, of course... -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html