On Tue, May 14, 2013 at 03:48:59PM +0200, Oleg Nesterov wrote: > On 05/13, Kent Overstreet wrote: > > > > +unsigned tag_alloc(struct tag_pool *pool, bool wait) > > +{ > > + struct tag_cpu_freelist *tags; > > + unsigned long flags; > > + unsigned ret; > > +retry: > > + preempt_disable(); > > + local_irq_save(flags); > > + tags = this_cpu_ptr(pool->tag_cpu); > > + > > + while (!tags->nr_free) { > > + spin_lock(&pool->lock); > > + > > + if (pool->nr_free) > > + move_tags(tags->free, &tags->nr_free, > > + pool->free, &pool->nr_free, > > + min(pool->nr_free, pool->watermark)); > > + else if (wait) { > > + struct tag_waiter wait = { .task = current }; > > + > > + __set_current_state(TASK_UNINTERRUPTIBLE); > > + list_add(&wait.list, &pool->wait); > > + > > + spin_unlock(&pool->lock); > > + local_irq_restore(flags); > > + preempt_enable(); > > + > > + schedule(); > > + __set_current_state(TASK_RUNNING); > > schedule() always returns in TASK_RUNNING state > > > + > > + if (!list_empty_careful(&wait.list)) { > > + spin_lock_irqsave(&pool->lock, flags); > > + list_del_init(&wait.list); > > + spin_unlock_irqrestore(&pool->lock, flags); > > This is only theoretical, but racy. > > tag_free() does > > list_del_init(wait->list); > /* WINDOW */ > wake_up_process(wait->task); > > in theory the caller of tag_alloc() can notice list_empty_careful(), > return without taking pool->lock, exit, and free this task_struct. > > But the main problem is that it is not clear why this code reimplements > add_wait_queue/wake_up_all, for what? To save on locking... there's really no point in another lock for the wait queue. Could just use the wait queue lock instead I suppose, like wait_event_interruptible_locked() (the extra spin_lock()/unlock() might not really cost anything but nested irqsave()/restore() is ridiculously expensive, IME). > I must admit, I do not understand what this code actually does ;) > I didn't try to read it carefully though, but perhaps at least the > changelog could explain more? The changelog is admittedly terse, but that's basically all there is to it - Say you've got a device where you can have multiple outstanding commands - you'll identify commands/responses by some integer (the "tag"). Typically you won't get a full 64 bits for the tag, it might be 10 or 16 or 32 bits or whatever - and even if you could use raw pointers you wouldn't really want to because then if the device gives you garbage response you're derefing an untrusted pointer - you want to allocate tag structures out of a fixed array so you can validate responses. So you preallocate all your tag structures up front - now you can refer to them by small fixed integers. But if you want to be able to efficiently allocate from the same pool of tags across multiple CPUs - well, that's what this code is for. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html