Re: [PATCH mdadm v2 1/2] mdadm: Add --discard option for Create

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2022-09-13 01:35, Mariusz Tkaczyk wrote:
> On Fri, 9 Sep 2022 09:47:21 -0600
> Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote:
>>>> +{
>>>> +	uint64_t range[2] = {offset, size};  
>>> Probably you don't need to specify [2] but it is not an issue I think.
>>>   
>>>> +	unsigned long buf[4096 / sizeof(unsigned long)];  
>>>
>>> Can you use any define for 4096?   
>>
>> I don't see any appropriate defines in the code base. It really just
>> needs to be bigger than any O_DIRECT restrictions. 4096 bytes is usually
>> the worst case.
> 
> See comment bellow.

I don't see how the comment below relates to this at all. This 4k has
nothing to do with anything related to the array, it wos only a
convenient size to read and check the result. But per Martin's points,
this code will go away in v3 seeing it's more appropriate to use
WRITE_ZEROS and that interface guarantees that there will be zeros, thus
no need to check.


>> That's correct. I discussed this in the cover letter. That's why this
>> check is here. Per some of the discussion from others I still think the
>> best course of action is to just check what the discard did and fail if
>> it is non-zero. Even though many NVMe and ATA devices have the ability
>> to control or query the behaviour, the kernel doesn't support this and
>> I don't think it can be relied upon.
> 
> It is also controversial approach[1].
> 
> I think that the best we can do here is to warn user, for example:
> "Please ensure that drive you used return zeros after discard."
> We should ask for confirmation (it should be possible to skip with --run).
> I would like to leave discard feature validation on user side, it is not mdadm
> task. Simply, if you want to use it, then it is done on your own risk and
> hopefully you know what you want to achieve.
> That will simplify implementation on mdadm side. What do you think?

I think the better approach is to just use WRITE_ZEROS.

>>>> @@ -945,6 +983,15 @@ int Create(struct supertype *st, char *mddev,
>>>>  				}
>>>>  				if (fd >= 0)
>>>>  					remove_partitions(fd);
>>>> +
>>>> +				if (s->discard &&
>>>> +				    discard_device(c, fd, dv->devname,
>>>> +						   dv->data_offset << 9,
>>>> +						   s->size << 10)) {
>>>> +					ioctl(mdfd, STOP_ARRAY, NULL);
>>>> +					goto abort_locked;
>>>> +				}
>>>> +  
>>> Feel free to use up to 100 char in one line it is allowed now.
>>> Why we need dv->data_offset << 9 and  s->size << 10 here?
>>> How this applies to zoned raid0?  
>>
>> As I understand it the offset and size will give the bounds of the
>> data region on the disk. Do you not think it works for zoned raid0?
> 
> mdadm operates on 512B, so using 4K data regions could be destructive.
> Also left shift causes that size value is increasing. We can't clear more that
> user requested. We need to use 512b sectors as mdadm does.

I don't really follow this.

> I don't know how native raid0 size is passed and how zones are created but I
> suspect that s->size may not be correct for all drives. It it a global property
> but for zoned raid member size could be different. You need to check how it
> applies.

> Also, I'm not sure if this feature is needed for raid0, because there is no
> resync. Maybe we can exclude raid0 from discard?

I'll check raid0 size. If possible I'd rather not have the restriction
to avoid raid0 as it becomes complicated and users may have reason to
zero besides avoiding resync.
>>
>> Well it was my opinion that it was clearer in the code to just
>> explicitly include discard in the conditionals instead of making discard
>> also set assume-clean, but if you think otherwise I can change it for v3.
>>
>> What kind of user message are you thinking is necessary here?
> 
> In my convention, all shape attributes should be set in mdadm.c, later they
> should be considered as const, we should not overwrite them. This structure
> represents user settings.
> This is why I consider updating assume_clean in Create.c as wrong. When discard
> is set then we are assuming that user want to skip resync too, otherwise it
> doesn't have sense. Also, if discard of any drive fails we are returning error
> and create operation will fail anyway.

The v2 version of this patch did not modify the shape attributes in
Create.c; that was only in v1.

> I would expected something like: "Discard requested, setting --assume-clean to
> skip resync".
> Also, will be great if you can add some tests, at least for command-line.

Ok.

Logan



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux