Re: [PATCH 04/24] sysfs: Normalize removing sysfs directories.

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Sat, 30 May 2009 08:15:53 -0700

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> writes:

> On Sat, 2009-05-30 at 06:07 -0700, Eric W. Biederman wrote:
>> Tejun Heo <tj@xxxxxxxxxx> writes:
>> 
>> > Eric W. Biederman wrote:
>> >>   Also, I'm quite uncomfortable with these things
>> >>> being done in non-atomic manner.  It can be made to work but things
>> >>> like this can lead to subtle race conditions and with the kind of
>> >>> layering we put on top of sysfs (kobject, driver model, driver
>> >>> midlayers and so on), it isn't all that easy to verify what's going
>> >>> on, so NACK for this one.
>> >> 
>> >> Total nonsense.
>> >> 
>> >> Mucking about with sysfs after we start deleting a directory is a bug.
>> >> At worst my change makes a buggy race slightly less deterministic.
>> >> 
>> >> I am not ready to consider keeping the current unnecessary atomic
>> >> removal step.  That unnecessary atomicity makes the following patches
>> >> more difficult, and requires a lot of unnecessary retesting.
>> >> 
>> >> What do you think the extra unnecessary atomicity helps protect?
>> >
>> > It's just not a clean API.  When people are trying to code things way
>> > up in the stack, they aren't likely to look up the code to see what
>> > assumptions are being made especially when the stack is deep and
>> > complex and sysfs is near the bottom of the tall stack.  IMHO
>> > implementing the usually expected semantics at this depth is worth
>> > every effort.  It's just good implementation style which might look
>> > like wasted effort but will harden the stack in the long run.  Plus,
>> > it's not like making it atomic is difficult or anything.
>> 
>> I guess we are going to have to disagree on this one.
>> 
>> My take is simply that a correct user has to wait until no one else
>> can find the kobject before calling kobject_del.  At which point
>> races are impossible, and it doesn't matter if sysfs_mutex is held
>> across the entire operation.
>
> I'm afraid this one isn't a valid assumption.  If you look in SCSI,
> you'll see we do get objects after they've been removed from visibility.
> We use it as part of the state model for how our objects work (objects
> removed from visibility are dying, but we still need them to be findable
> (and gettable).

I was not precise enough.  It appears I overlooked the fact that
kobject_del is not always called from kobject_put by way of
kobject_release.

Strictly the requirement is that after kobject_del we don't add,
remove or otherwise manipulate sysfs attributes.  That is we don't
call any of:

sysfs_add_file
sysfs_create_file
sysfs_create_bin_file
sysfs_remove_file
sysfs_remove_bin_file
sysfs_create_link
sysfs_remove_link
sysfs_create_group
sysfs_remove_group
sysfs_create_subdir
sysfs_remove_subdir

Those all either oops or BUG today if you try it.  So I can't see how
a subsystem could depend on those working.

Also there is sysfs_remove_dir (on a subdirectory) aka kobject_del on
a child object after kobject_del on the parent object.

As best I can tell that only works by fluke today.

> Now, this could be altered as part of an object lifetime rewrite of SCSI
> (and I suspect a few other subsystems) but it's certainly an open
> question of whether the pain is worth the gain.

I won't tell you that sysfs, the kobject layer, or the device layer
are the best thing since sliced bread.  I'm just trying to simplify
the code and get the bugs out.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html