Re: OSD::do_mon_report - do we need holding osd_lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Probably!  A quick glance at do_mon_report doesn't seem to turn up
anything I'd expect to be really hard to refactor.  You do need to
break out the required data (into OSDService, I'd think) so that the
lock is not necessary.
-Sam

On Mon, Aug 17, 2015 at 6:10 PM, GuangYang <yguang11@xxxxxxxxxxx> wrote:
> Hi Sam,
> Today I noticed a scenario that monitor marked OSD down since it did not receive the PG stats from the OSD, further investigation showed that the reason why OSD didn't report stats because it failed to acquire the osd_lock, what happened was:
>   1. one PG is undergoing long-run peering (search for missing objects)
>   2. An OP hold the osd_lock and try to acquire the PG lock, which is being held by 1).
>   3. OSD tick thread failed to acquire osd_lock and stuck for 10 minutes, thus failed to update to monitor its stats
>   4. monitor mark it down
>
> After looking at the code, we found several assertions (that osd_lock should be held) around OSD::do_mon_report, is that required? Any chance to overcome the problem described above by refactoring the locking there?
>
> Thanks,
> Guang
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux