Re: Backporting stability fixes for ceph-disk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 04/02/2016 10:13, Ken Dreyer wrote:
> On Wed, Feb 3, 2016 at 12:10 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote:
>> On 04/02/2016 00:56, Ken Dreyer wrote:
>>> What's the procedure for deactivating the Hammer udev rules, for example?
>>
>> rm /lib/udev/rules.d/*ceph*
>> udevadm control --reload # maybe superfluous
>>
> 
> I am surprised to see that we'd want to delete files from /lib. How
> would the user restore them afterwards? 

re-installing the ceph package that contains them will restore them.

> Sorry if this sounds dense;
> I'm definitely a udev noob. Could you provide a "starting from
> scratch" procedure for how to handle ceph-disk failures in Hammer?

My own bias is to understand why things go wrong before fixing them, which can be complicated when udev / initsystem / ceph-disk are involved. To this date I would still not be able to write a guide explaining how to do that reliably. Only recently did I discover that messages that should be in syslog could be discarded entirely on RHEL, unless the abrt package is installed. After which you have to know to collect the output from a file that is referenced in the syslog messages but not in the messages themselves.

If there is a suspicion that udev / initsystem / ceph-disk is not doing the right thing with hammer and understanding why is secondary, I would recommend removing the udev rules and doing things manually as suggested in the previous mail. Whenever there is a problem, it's usually not because individual components are at fault, it's because they race with each other in ways that were not fully understood back in hammer.

The most frequent mistake is thinking that more partprobe / partx is better and fixes things. It's actually the opposite: when the udev rules are in play, running more partprobe / partx will create new udev events that will race with those already in flight (see http://tracker.ceph.com/issues/14099 for instance). It can do even worse: partprobe /dev/sdb will remove existing partitions before adding them again, to be extra sure the kernel has an accurate view of the partition table. I let you imagine what that can do on a live system. partx does not have that problem but that's because it assumes the caller knows exactly what information the kernel has about the partition table. That leads to confusing situations when, for instance, a partition is added, partx called to notify the kernel which fires a udev event, partition is deleted and the caller fails to notify the kernel. If the same partition is added again, partx notifies the kernel which does nothing instead of firing a udev ev
 e
nt because the partition still exists from its point of view.

In hammer partprobe was not consistently guarded against such races (it's enough to udevadm settle ; partprobe ; udevadm settle but that was not done consistently) and had to call partprobe / partx more than once, for instance right after a journal partition was created and before creating the data partition. Calls to partprobe and udevadm settle also need to be more patient than the default, specially when dmcrypt is in play. What it means in practice is that ceph-disk must call udevadm settle --timeout=600 and call partprobe a few times before declaring failure (there is no user control over the partprobe timeout). The ceph-disk suite routinely shows partprobe try two or three times at 60 seconds intervals before succeeding (this is extreme because it happens in a cloud environment where performances vary a lot).

All these trouble go away if udev is deactivated because partprobe won't run ceph-disk indirectly. The timeout issue may still be a concern but I think that in real life situations, if ceph-disk prepare is done first and a separate script does the ceph-disk activate-all, the odds that ceph-disk activate fails because a partprobe run by ceph-disk prepare did not complete are very low. An automated script could do:

ceph-disk prepare /dev/sdb
ceph-disk prepare /dev/sdc
ceph-disk prepare /dev/sdd
...
udevadm settle --timeout=600
ceph-disk activate /dev/sdb1
ceph-disk activate /dev/sdc1
ceph-disk activate /dev/sdd1
...

I hope that clarifies the situation ?

-- 
Loïc Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux