On Wed, Aug 26, 2009 at 09:13:34AM +0200, Hans de Goede wrote: > Hi, > > On 08/26/2009 01:41 PM, Joel Granados wrote: >> On Wed, Aug 26, 2009 at 06:38:26AM +0200, Hans de Goede wrote: >>> Hi All, >>> >>> While testing Intel BIOS RAID using mdraid I noticed that the kernels >>> view of the partition table never changes even though we successfully >>> make Parted.Disk.commitToOs() calls. >>> >>> This has let to me diving into libparted's commit_to_os() code for Linux >>> and there are multiple issues hiding in there: >>> >>> 1) Parted reads /sys/block/foo/range to determine how many partitions >>> the device type supports and then makes BLKPG ioctl's to update the >>> kernels view of the partition table for partitions which fall into >>> this range. However for example /sys/block/sda/range contains 16, >>> there are 2 issue with libparted using this number: >>> 1) scsi major's only support 15 partitions, 1 of the range of 16 >>> is reserved for the whole device, yet libparted will try >>> to notify the kernel about 16 partitions if present >>> 2) If the major's partition minor's run out, the kernel will switch >>> to the mdp major for the other partitions, iow range no longer limits >>> the number of partitions. >>> >>> 2) libparted assumes the user knows what he is doing, and will ignore >>> -ebusy errors for partitions, assuming that the user is smart enough >>> to only change unused partitions (BAD, really really BAD) >>> >>> 3) because of 1) libparted will only sync 1 partition on /dev/md# devices >>> (would be 0 if not for the of by 1 bug as all md#p# partitions use the >>> mdp major), and it fails to even do that without reporting an error. >>> >>> ### >>> >>> Now we can fix 1) by simply not checking /sys/block/foo/range, but instead >>> just syncing as many partitions as are in the table. >> >> This is not a plausable solution as deleting a partition would not show >> up in the refresh if we only refresh the partitions that are in the >> table. >> > > Good point. > >>> 2) is more troublesome, >>> we could just make -EBUSY an error, but that may annoy / bug some users. >> >> This is something that I was working on last week and is the reason I >> asked you (hans) to run a script. It indeed returns EBUSY (sometimes) >> when the partition is not realy mounted. I have tested a mechanism by >> which I retry a arbitrary number of times and the call seems to work >> after some 4 retries. But, I don't like arbitrary values. Moreover, I >> don't thing getting this error to the user is the solution, given that >> it is easyly solvable with some retries. What I am trying to do ATM is >> come up with a mechanism that does not involve "arbitrary" numbers. >> > > Hmm, are you sure it really is randomly returning -EBUSY ?? Maybe some slow > process is using the disk every now and then. Unfortunatelly I'm not sure of what is causing the issue. I call it random because I do not understand the behavior yet. > > >>> >>> An even bigger problem IMHO is the use of the BLKPG ioctl instead of BLKRRPART >>> at all. What this does is tell the kernel parted's view of the partition table >>> and make it use that, instead of telling the kernel to reread the partition table. >>> According to the parted sources this is done for the case where the kernel does >>> not know the disklabel type. However during initial scanning, when we don't modify >>> a disk, and during boot and normal running of the system, we rely on the kernel's >>> view. So IMHO it would be much better to always use the kernels view and just >>> always call BLKRRPART in commit_to_os(), this would solve all of the above issues. >>> >> >> Well, the thing you are expressing here is the difference between >> commit_to_os and commit_to_dev. one tells the kernel what parted has in >> memory and the other writes to disk. >> > > Erm, no, these are both commit to os. The difference is BLKPG is modifying the kernels where does ped_disk_commit_to_dev call BLKPG or BLKRRPART? I see it getting to BLKFLSBUF, but not to the others. So ped_disk_commit_to_dev does not commit to os. it simply wirtes to dev. > partitiontable to represent parted's in memory view. Where as BLKRRPART tells the kernel > to re-read the table from disk and use that, since on a normal boot the kernel reads the > table from disk, telling it to re-read it seems best to me, as that leads to the kernel > actually using the same view of the table as will be used during a normal boot of the > system (although I admit the 2 views should not differ, currently they can due to multiple > issues with parted's BLKPG code). > >> I think a much better solution would be a commit_kernel_read_dev >> function, that implements just what you suggested. > > IMHO that is what commit_to_os is supposed to do, and what it does in some cases. Currently > parted is not very consistend in some scenario's it will use BLKPG and ignore ebusy errors Ack, not consistent... > and in others it will use BLKRRPART and report EBUSY errors. It seems more consistent to > me to just always use BLKRRPART. > > See the attached patch which I'm currently using for testing. > > Regards, > > Hans > diff -up parted-1.9.0/libparted/arch/linux.c~ parted-1.9.0/libparted/arch/linux.c > --- parted-1.9.0/libparted/arch/linux.c~ 2009-08-26 06:40:11.000000000 +0200 > +++ parted-1.9.0/libparted/arch/linux.c 2009-08-26 06:51:21.000000000 +0200 > @@ -41,7 +41,6 @@ > #include <libdevmapper.h> > #endif > > -#include "blkpg.h" > #include "../architecture.h" > #include "dirname.h" > > @@ -587,22 +586,6 @@ _get_linux_version () > return kver = KERNEL_VERSION (major, minor, teeny); > } > > -static int > -_have_devfs () > -{ > - static int have_devfs = -1; > - struct stat sb; ... > @@ -2621,19 +2421,6 @@ linux_disk_commit (PedDisk* disk) > return _dm_reread_part_table (disk); > #endif > if (disk->dev->type != PED_DEVICE_FILE) { > - /* The ioctl() command BLKPG_ADD_PARTITION does not notify > - * the devfs system; consequently, /proc/partitions will not > - * be up to date, and the proper links in /dev are not > - * created. Therefore, if using DevFS, we must get the kernel > - * to re-read and grok the partition table. > - */ > - /* Work around kernel dasd problem so we really do BLKRRPART */ > - if (disk->dev->type != PED_DEVICE_DASD && > - _have_blkpg () && !_have_devfs ()) { > - if (_disk_sync_part_table (disk)) > - return 1; > - } > - > return _kernel_reread_part_table (disk->dev); > } > Instead of removing a whole bunch of code I prefer addint an additional function and deprecating these that you erased. what do you think? regards. -- Joel Andres Granados Brno, Czech Republic, Red Hat.
Attachment:
pgpTHeK14puj0.pgp
Description: PGP signature
_______________________________________________ Anaconda-devel-list mailing list Anaconda-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/anaconda-devel-list