Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held

Guoqing Jiang <guoqing.jiang@xxxxxxxxx> · Tue, 14 Dec 2021 18:03:43 +0800

On 12/14/21 5:31 PM, Donald Buczek wrote:

  -void md_reap_sync_thread(struct mddev *mddev)
+void md_reap_sync_thread(struct mddev *mddev, bool 
reconfig_mutex_held)
  {
      struct md_rdev *rdev;
      sector_t old_dev_sectors = mddev->dev_sectors;
      bool is_reshaped = false;
        /* resync has finished, collect result */
+    if (reconfig_mutex_held)
+        mddev_unlock(mddev);

If one thread got here, e.g. via action_store( /* "idle" */ ), now 
that the mutex is unlocked, is there anything which would prevent 
another thread getting  here as well, e.g. via the same path?

If not, they both might call

md_unregister_thread(&mddev->sync_thread);

Which is not reentrant:

void md_unregister_thread(struct md_thread **threadp)
{
    struct md_thread *thread = *threadp;
    if (!thread)
        return;
    pr_debug("interrupting MD-thread pid %d\n", 
task_pid_nr(thread->tsk));
    /* Locking ensures that mddev_unlock does not wake_up a
     * non-existent thread
     */
    spin_lock(&pers_lock);
    *threadp = NULL;
    spin_unlock(&pers_lock);

    kthread_stop(thread->tsk);
    kfree(thread);
}

This might be a preexisting problem, because the call site in 
dm-raid.c, which you updated to `md_reap_sync_thread(mddev, 
false);`, didn't hold the mutex anyway.

Am I missing something? Probably, I do.

Otherwise: Move the deref of threadp in md_unregister_thread() into 
the spinlock scope?

Good point, I think you are right.

And actually pers_lock does extra service to protect accesses to 
mddev->thread (I think it
also suitable for mddev->sync_thread ) when the mutex can't be held. 
Care to send a patch
for it?

I'm really sorry, but it's one thing to point to a possible problem 
and another thing to come up with a correct solution.

Yes, it is often the reality of life, and we can always correct 
ourselves if there is problem 😎.

While I think it would be easy to avoid the double free with the 
spinlock (or maybe atomic RMW) , we surely don't want to hold the 
spinlock while we are sleeping in kthread_stop(). If we don't hold 
some kind of lock, what are the side effects of another sync thread 
being started or any other reconfiguration? Are the existing flags 
enough to protect us from this? If we do want to hold the lock while 
waiting for the thread to terminate, should it be made into a mutex? 
If so, it probably shouldn't be static but moved into the mddev 
structure. I'd need weeks if not month to figure that out and to feel 
bold enough to post it.

Thanks for deep thinking about it, I think we can avoid to call 
kthread_stop with spinlock
held. Maybe something like this, but just my raw idea, please have a 
thorough review.

 void md_unregister_thread(struct md_thread **threadp)
 {
-       struct md_thread *thread = *threadp;
-       if (!thread)
+       struct md_thread *thread = READ_ONCE(*threadp);
+
+       spin_lock(&pers_lock);
+       if (!thread) {
+               spin_unlock(&pers_lock);
                return;
+       }
        pr_debug("interrupting MD-thread pid %d\n", 
task_pid_nr(thread->tsk));
        /* Locking ensures that mddev_unlock does not wake_up a
         * non-existent thread
         */
-       spin_lock(&pers_lock);
        *threadp = NULL;
        spin_unlock(&pers_lock);

-       kthread_stop(thread->tsk);
+       if (IS_ERR_OR_NULL(thread->tsk)) {
+               kthread_stop(thread->tsk);
+               thread->tsk = NULL;
+       }
        kfree(thread);
 }
 EXPORT_SYMBOL(md_unregister_thread);

I don't want to push work to others, but my own my understanding of md 
is to narrow.

Me either 😉

Thanks,
Guoqing

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel