Re: puzzling disapearance of /dev/sdc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 18/12/2015 16:31, Ilya Dryomov wrote:
> On Fri, Dec 18, 2015 at 1:38 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote:
>> Hi Ilya,
>>
>> It turns out that sgdisk 0.8.6 -i 2 /dev/vdb removes partitions and re-adds them on CentOS 7 with a 3.10.0-229.11.1.el7 kernel, in the same way partprobe does. It is used intensively by ceph-disk and inevitably leads to races where a device temporarily disapears. The same command (sgdisk 0.8.8) on Ubuntu 14.04 with a 3.13.0-62-generic kernel only generates two udev change events and does not remove / add partitions. The source code between sgdisk 0.8.6 and sgdisk 0.8.8 did not change in a significant way and the output of strace -e ioctl sgdisk -i 2 /dev/vdb is identical in both environments.
>>
>> ioctl(3, BLKGETSIZE, 20971520)          = 0
>> ioctl(3, BLKGETSIZE64, 10737418240)     = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, HDIO_GETGEO, {heads=16, sectors=63, cylinders=16383, start=0}) = 0
>> ioctl(3, HDIO_GETGEO, {heads=16, sectors=63, cylinders=16383, start=0}) = 0
>> ioctl(3, BLKGETSIZE, 20971520)          = 0
>> ioctl(3, BLKGETSIZE64, 10737418240)     = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKGETSIZE, 20971520)          = 0
>> ioctl(3, BLKGETSIZE64, 10737418240)     = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>> ioctl(3, BLKSSZGET, 512)                = 0
>>
>> This leads me to the conclusion that the difference is in how the kernel reacts to these ioctl.
> 
> I'm pretty sure it's not the kernel versions that matter here, but
> systemd versions.  Those are all get-property ioctls, and I don't think
> sgdisk -i does anything with the partition table.
> 
> What it probably does though is it opens the disk for write for some
> reason.  When it closes it, udevd (systemd-udevd process) picks that
> close up via inotify and issues the BLKRRPART ioctl, instructing the
> kernel to re-read the partition table.  Technically, that's different
> from what partprobe does, but it still generates those udev events you
> are seeing in the monitor.
> 
> AFAICT udevd started doing this in v214.

That explains everything indeed.

# strace -f -e open sgdisk -i 2 /dev/vdb
...
open("/dev/vdb", O_RDONLY)              = 4
open("/dev/vdb", O_WRONLY|O_CREAT, 0644) = 4
open("/dev/vdb", O_RDONLY)              = 4
Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
Partition unique GUID: 7BBAA731-AA45-47B8-8661-B4FAA53C4162
First sector: 2048 (at 1024.0 KiB)
Last sector: 204800 (at 100.0 MiB)
Partition size: 202753 sectors (99.0 MiB)
Attribute flags: 0000000000000000
Partition name: 'ceph journal'

# strace -f -e open blkid /dev/vdb2
...
open("/etc/blkid.conf", O_RDONLY)       = 4
open("/dev/.blkid.tab", O_RDONLY)       = 4
open("/dev/vdb2", O_RDONLY)             = 4
open("/sys/dev/block/253:18", O_RDONLY) = 5
open("/sys/block/vdb/dev", O_RDONLY)    = 6
open("/dev/.blkid.tab-hVvwJi", O_RDWR|O_CREAT|O_EXCL, 0600) = 4

blkid does not open the device for write, hence the different behavior. Switching sgdisk in favor of blkid fixes the issue.

Nice catch !

> Thanks,
> 
>                 Ilya
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux