RE: OSD::do_mon_report - do we need holding osd_lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Sam, I will go ahead opening a tracker for this.

Thanks,
Guang


----------------------------------------
> Date: Tue, 18 Aug 2015 08:42:04 -0700
> Subject: Re: OSD::do_mon_report - do we need holding osd_lock
> From: sjust@xxxxxxxxxx
> To: yguang11@xxxxxxxxxxx
> CC: ceph-devel@xxxxxxxxxxxxxxx
>
> Probably! A quick glance at do_mon_report doesn't seem to turn up
> anything I'd expect to be really hard to refactor. You do need to
> break out the required data (into OSDService, I'd think) so that the
> lock is not necessary.
> -Sam
>
> On Mon, Aug 17, 2015 at 6:10 PM, GuangYang <yguang11@xxxxxxxxxxx> wrote:
>> Hi Sam,
>> Today I noticed a scenario that monitor marked OSD down since it did not receive the PG stats from the OSD, further investigation showed that the reason why OSD didn't report stats because it failed to acquire the osd_lock, what happened was:
>> 1. one PG is undergoing long-run peering (search for missing objects)
>> 2. An OP hold the osd_lock and try to acquire the PG lock, which is being held by 1).
>> 3. OSD tick thread failed to acquire osd_lock and stuck for 10 minutes, thus failed to update to monitor its stats
>> 4. monitor mark it down
>>
>> After looking at the code, we found several assertions (that osd_lock should be held) around OSD::do_mon_report, is that required? Any chance to overcome the problem described above by refactoring the locking there?
>>
>> Thanks,
>> Guang
 		 	   		  ?韬{.n?????%??檩??w?{.n????u朕?Ф?塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux