On Tue, 13 Sep 2022 09:43:52 -0600 Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote: > On 2022-09-13 01:35, Mariusz Tkaczyk wrote: > > On Fri, 9 Sep 2022 09:47:21 -0600 > > Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote: > >>>> +{ > >>>> + uint64_t range[2] = {offset, size}; > >>> Probably you don't need to specify [2] but it is not an issue I think. > >>> > >>>> + unsigned long buf[4096 / sizeof(unsigned long)]; > >>> > >>> Can you use any define for 4096? > >> > >> I don't see any appropriate defines in the code base. It really just > >> needs to be bigger than any O_DIRECT restrictions. 4096 bytes is usually > >> the worst case. > > > > See comment bellow. > > I don't see how the comment below relates to this at all. This 4k has > nothing to do with anything related to the array, it wos only a > convenient size to read and check the result. But per Martin's points, > this code will go away in v3 seeing it's more appropriate to use > WRITE_ZEROS and that interface guarantees that there will be zeros, thus > no need to check. I suggested to skip verification at all in next comment but as you said with WRITE_ZEROS it is not needed anyway. Sorry for being inaccurate. > > > >> That's correct. I discussed this in the cover letter. That's why this > >> check is here. Per some of the discussion from others I still think the > >> best course of action is to just check what the discard did and fail if > >> it is non-zero. Even though many NVMe and ATA devices have the ability > >> to control or query the behaviour, the kernel doesn't support this and > >> I don't think it can be relied upon. > > > > It is also controversial approach[1]. > > > > I think that the best we can do here is to warn user, for example: > > "Please ensure that drive you used return zeros after discard." > > We should ask for confirmation (it should be possible to skip with --run). > > I would like to leave discard feature validation on user side, it is not > > mdadm task. Simply, if you want to use it, then it is done on your own risk > > and hopefully you know what you want to achieve. > > That will simplify implementation on mdadm side. What do you think? > > I think the better approach is to just use WRITE_ZEROS. Agree. > > >>>> @@ -945,6 +983,15 @@ int Create(struct supertype *st, char *mddev, > >>>> } > >>>> if (fd >= 0) > >>>> remove_partitions(fd); > >>>> + > >>>> + if (s->discard && > >>>> + discard_device(c, fd, dv->devname, > >>>> + dv->data_offset << 9, > >>>> + s->size << 10)) { > >>>> + ioctl(mdfd, STOP_ARRAY, NULL); > >>>> + goto abort_locked; > >>>> + } > >>>> + > >>> Feel free to use up to 100 char in one line it is allowed now. > >>> Why we need dv->data_offset << 9 and s->size << 10 here? > >>> How this applies to zoned raid0? > >> > >> As I understand it the offset and size will give the bounds of the > >> data region on the disk. Do you not think it works for zoned raid0? > > > > mdadm operates on 512B, so using 4K data regions could be destructive. > > Also left shift causes that size value is increasing. We can't clear more > > that user requested. We need to use 512b sectors as mdadm does. > > I don't really follow this. I understand that you want left shit is used to round size to data region and I assumed that data_region is 4K and that is probably wrong. You are right I has no sense, my apologizes. Let's imagine that our size is for example, 2687 sectors. Left shit will cause that we will get 2751488 and that will be passed as a size to function. Similar for data_offset. That is much more than we want to clear. Do I miss something? I guess that ioctl operates on sectors too but please correct me if that is wrong. > > > I don't know how native raid0 size is passed and how zones are created but I > > suspect that s->size may not be correct for all drives. It it a global > > property but for zoned raid member size could be different. You need to > > check how it applies. > > > Also, I'm not sure if this feature is needed for raid0, because there is no > > resync. Maybe we can exclude raid0 from discard? > > I'll check raid0 size. If possible I'd rather not have the restriction > to avoid raid0 as it becomes complicated and users may have reason to > zero besides avoiding resync. Agree, thanks. > >> > >> Well it was my opinion that it was clearer in the code to just > >> explicitly include discard in the conditionals instead of making discard > >> also set assume-clean, but if you think otherwise I can change it for v3. > >> > >> What kind of user message are you thinking is necessary here? > > > > In my convention, all shape attributes should be set in mdadm.c, later they > > should be considered as const, we should not overwrite them. This structure > > represents user settings. > > This is why I consider updating assume_clean in Create.c as wrong. When > > discard is set then we are assuming that user want to skip resync too, > > otherwise it doesn't have sense. Also, if discard of any drive fails we are > > returning error and create operation will fail anyway. > > The v2 version of this patch did not modify the shape attributes in > Create.c; that was only in v1. Ok, I missed that, thanks. > > > I would expected something like: "Discard requested, setting --assume-clean > > to skip resync". > > Also, will be great if you can add some tests, at least for command-line. > > Ok. > Mariusz