On 2022-09-09 03:57, Mariusz Tkaczyk wrote: >> If all the discard requests are successful and there are no missing >> disks thin it is safe to set assume_clean as we know the array is clean. > > Please update message. We agreed in v1 that missing disks and discard features > are not related, right? Oops, yes, I'll update the commit message for v3. >> +static int discard_device(struct context *c, int fd, const char *devname, >> + unsigned long long offset, unsigned long long size) > > Will be great if you can description. Ok, will do for v3. >> +{ >> + uint64_t range[2] = {offset, size}; > Probably you don't need to specify [2] but it is not an issue I think. > >> + unsigned long buf[4096 / sizeof(unsigned long)]; > > Can you use any define for 4096? I don't see any appropriate defines in the code base. It really just needs to be bigger than any O_DIRECT restrictions. 4096 bytes is usually the worst case. >> + unsigned long i; >> + >> + if (c->verbose) >> + printf("discarding data from %lld to %lld on: %s\n", >> + offset, size, devname); >> + >> + if (ioctl(fd, BLKDISCARD, &range)) { >> + pr_err("discard failed on '%s': %m\n", devname); >> + return 1; >> + } >> + >> + if (pread(fd, buf, sizeof(buf), offset) != sizeof(buf)) { >> + pr_err("failed to readback '%s' after discard: %m\n", >> devname); >> + return 1; >> + } >> + >> + for (i = 0; i < ARRAY_SIZE(buf); i++) { >> + if (buf[i]) { >> + pr_err("device did not read back zeros after discard >> on '%s': %lx\n", >> + devname, buf[i]); > In previous version I wanted to leave the message on stderr, but just move a > data (buf[i]) to debug, or if (verbose > 0). > I think that printing binary data in error message is not necessary. I added the hex because it might be informative to know what a discard did to the device (all FFs or random data). > BTW. I'm not sure if discard ensures that data will be all zero. It causes that > drive drops all references but I doesn't mean that data is zeroed. Could you > please check it in documentation? Should we expect zeroes? That's correct. I discussed this in the cover letter. That's why this check is here. Per some of the discussion from others I still think the best course of action is to just check what the discard did and fail if it is non-zero. Even though many NVMe and ATA devices have the ability to control or query the behaviour, the kernel doesn't support this and I don't think it can be relied upon. >> @@ -945,6 +983,15 @@ int Create(struct supertype *st, char *mddev, >> } >> if (fd >= 0) >> remove_partitions(fd); >> + >> + if (s->discard && >> + discard_device(c, fd, dv->devname, >> + dv->data_offset << 9, >> + s->size << 10)) { >> + ioctl(mdfd, STOP_ARRAY, NULL); >> + goto abort_locked; >> + } >> + > Feel free to use up to 100 char in one line it is allowed now. > Why we need dv->data_offset << 9 and s->size << 10 here? > How this applies to zoned raid0? As I understand it the offset and size will give the bounds of the data region on the disk. Do you not think it works for zoned raid0? >> diff --git a/mdadm.c b/mdadm.c >> index 972adb524dfb..049cdce1cdd2 100644 >> --- a/mdadm.c >> +++ b/mdadm.c >> @@ -602,6 +602,10 @@ int main(int argc, char *argv[]) >> s.assume_clean = 1; >> continue; >> >> + case O(CREATE, Discard): >> + s.discard = true; >> + continue; >> + > > I would like to set s.assume_clean=true along with discard. Then will be no need > to modify other conditions. If we are assuming that after discard all is zeros > then we can skip resync, right? According to message, it should be. > Please add message for user and set assume_clean too. Well it was my opinion that it was clearer in the code to just explicitly include discard in the conditionals instead of making discard also set assume-clean, but if you think otherwise I can change it for v3. What kind of user message are you thinking is necessary here? Logan