On Tue, Apr 5, 2016 at 4:10 PM, Sam Yaple <samuel@xxxxxxxxx> wrote: > On Tue, Apr 5, 2016 at 1:26 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >> >> On Tue, Apr 5, 2016 at 1:48 PM, Sam Yaple <samuel@xxxxxxxxx> wrote: >> > On Tue, Apr 5, 2016 at 10:21 AM, Loic Dachary <loic@xxxxxxxxxxx> wrote: >> >> >> >> Hi Ilya, >> >> >> >> On 05/04/2016 11:26, Ilya Dryomov wrote: >> >> > On Tue, Apr 5, 2016 at 10:30 AM, Sebastien Han <shan@xxxxxxxxxx> >> >> > wrote: >> >> >> Wido, I just discussed that with Sam last week and it seems that >> >> >> bcache allocates a minor of 1 when creating the device. >> >> >> Sam ended up writing this: >> >> >> https://yaple.net/2016/03/31/bcache-partitions-and-dkms/ >> >> >> The fix is not complex not sure why it is not part of bcache yet... >> >> > >> >> > I think it's just that no one complained loud enough. >> >> > >> >> >> >> >> >> Not sure if it's ceph-disk's job to do all of this with bcache >> >> >> though... >> >> >> We might need to check with the bache guys what are their plans >> >> >> about >> >> >> this. >> >> >> If this will go through at some point we might just wait, if not we >> >> >> could implement the partition trick on ceph-disk. >> >> > >> >> > Making something like this go through shouldn't be a problem. Sam's >> >> > patch is a bit of quick hack though - it messes up bcache device IDs >> >> > and also limits the number of partitions to 16. Better to avoid >> >> > another hard-coded constant, if possible. >> > >> > >> > This has already been discussed with Kent Overstreet in the IRCs. I am >> > looking into patching properly (this was very much a quick-and-dirty >> > hack) >> > but I will admit it is not a top priority for me. As far as it 'messing >> > up >> > bcache devices IDs' I would entirely disagree. For starters, this is how >> > zfs >> > spits out its volumes (/dev/zdb0, /dev/zdb16, etc). But more importantly >> > I >> > think is that up until this point bcache has been using the device minor >> > number _as_ the bcache device number. Changing that behavior is less >> > than >> > ideal to me and surely more prone to bugs. Since you can't be assured >> > that >> > bcache0 will be the same device after a reboot anyway, I dont see why it >> > matters. Use PartUUIDs and other labels and be done with it. >> >> This is just common sense: if I create three bcache devices on my >> system, I'd expect them to be named /dev/bcache{0,1,2} (or {1,2,3}, or >> {a,b,c}) just like other block devices are. An out-of-tree zfs is >> hardly the best example to follow here. > > > They actually _won't_ be named bcache{0,1,2} in a long running system. > Scenario is you create and delete a bcache device, the module keeps track of > that counter so it might be /dev/bcache{0,4,5}. Maybe call it a bug? either > way thats how it currently works. Fair enough about zfs, and frankly I don't > have strong opinions about the naming. Im worried about regressions though. Does it? It's not a simple counter - those IDs are managed by an ida tree. You grab an ID, when you are done with it (i.e. delete a device), you put it back and it becomes available for reuse. >> >> >> Of course if userspace tooling expects or relies on minor numbers being >> equal to device IDs, that's a good enough reason to keep it as is. The >> same goes for limiting the number of partitions to 16: if tools expect >> the major to be the same for all bcache device partitions, it'd have to >> be hard-coded. > > The hard-coded-ness was part of the quick-and-dirty fix. I haven't looked at > how, say, the sd* drivers work but they, by default, increment by 16. So at > the very least it could be a sane default. The loop driver has a max_part > option. Not a big fan of that myself, but it is an option. And again I > really don't care about the way we solve this as long as it is solved. Like > Kent said, we just need to find the "standard" way to do this, or the least > bad. I think the most "standard" way is: - Reuse device IDs. It looks like bcache is already using an ida tree for its minors (device IDs, currently), so if you are seeing {0,1,4}, it's probably a bug. - Preallocate 16 minors. Don't introduce max_part or anything like that. - Use extended devt to support devices with more than 16 partitions. The catch is those (p17+) partitions are going to have a different (fixed) major, so one would have to make sure that bcache tools are fine with that. That's what rbd, virtblk and a bunch of other block device drivers are doing and that's the patch that I had in mind. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html