Re: partprobe or partx or ... ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ilya,

On 21/09/2015 12:23, Ilya Dryomov wrote:
> On Sat, Sep 19, 2015 at 11:08 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote:
>>
>>
>> On 19/09/2015 17:23, Loic Dachary wrote:
>>> Hi Ilya,
>>>
>>> At present ceph-disk uses partprobe to ensure the kernel is aware of the latest partition changes after a new one is created, or after zapping the partition table. Although it works reliably (in the sense that the kernel is indeed aware of the desired partition layout), it goes as far as to remove all partition devices of the current kernel table, only to re-add them with the new partition table. The delay it implies is not an issue because ceph-disk is rarely called. It however generate many udev events (dozens remove/change/add for a two partition disk) and almost always creates border cases that are difficult to figure out and debug. While it is a good way to ensure that ceph-disk is idempotent and immune to race conditions, maybe it is needlessly hard.
>>>
>>> Do you know of a light weight alternative to partprobe ? In the past we've used partx but I remember it failed to address some border cases in non-intuitive ways. Do you know of another, simpler, approach to this ?
>>>
>>> Thanks in advance for your help :-)
>>>
>>
>> For the record using /sys/block/sdX/device/rescan sounds good but does not exist for devices created via devicemapper (used for dmcrypt and multipath).
> 
> Hi Loic,
> 
> Yeah, partprobe loops through the entire partition table, trying do
> delete/add every slot.  As an aside, the in-kernel way to do this
> (blockdev --rereadpt) is similar in that it also drops all partitions
> and re-adds them later, but it's faster and probably generates less
> change events.  The downside is it won't work on busy device.
> 
> I don't think there is any alternative, except for using partx --add
> with --nr, that is targeting specific slots in the partition table.  If
> all you are doing is adding partitions and zapping entire partition
> tables, that may work well enough.
> 
> That said, given that the resulting delay (which can be in the seconds
> range, especially if your disk happens to have a busy partition) isn't
> a problem, what difference does it make?  What are you listening to
> those events for?

This is part of the ceph-disk prepare / activate workflow:

 ceph-disk prepare creates partitions, mounts them, populate them and exits
 ceph udev rules ( 95-ceph-osd.rules ) react to udev events when the partition type is known and run ceph-disk activate in the background

When a machine boots or a disk is hot swapped, udev rules do the same and activate: we only have one code path for all cases. The problem is to ensure all race conditions are addressed. What used to work in hammer has to be revisited because the code path was changed in infernalis. udev actions no longer call ceph-disk activate, because it can take a long time and that's not what udev is good at. Instead, udev actions run ceph-disk activate in the background, using systemd/upstart when available (it falls back to the legacy syncrhonous behavior when they are not available).

I think I managed to address all race conditons with the patch series at https://github.com/ceph/ceph/pull/5999.

We should be good with partprobe :-)

> 
> /sys/block/sdX/device/rescan is sd only, and AFAIK it doesn't generally
> trigger a re-read of a partition table.

Thanks a lot for your insights !

Cheers

> 
> Thanks,
> 
>                 Ilya
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux