Re: [PATCH mdadm 1/2] mdadm: Add --discard option for Create

Logan Gunthorpe <logang@xxxxxxxxxxxx> · Thu, 8 Sep 2022 11:01:42 -0600

Hi Mariusz,

Thanks for the feedback. I've incorporated it into a v2 patch set which
I will send later today. Specific responses below.

On 2022-09-08 01:56, Mariusz Tkaczyk wrote:
> On Wed,  7 Sep 2022 14:03:54 -0600
> Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote:
> 
>> Add the --discard option for Create which will send BLKDISCARD requests to
>> all disks before assembling the array. This is a fast way to know the
>> current state of all the disks. If the discard request zeros the data
>> on the disks (as is common but not universal) then the array is in a
>> state with correct parity. Thus the initial sync may be skipped.
> 
> Are we discarding whole device or only space for array? I can see that we are
> specifying range in ioctl.

Just the space for the array.

>> +static int discard_device(const char *devname, unsigned long long size)
>> +{
>> +	uint64_t range[2] = {0, size};
> Are we starting always from 0? If yest then it introduces bug. In IMSM there is
> a matrix array conception. Two arrays on same drives.
> I suspect that we are able to erase data from first array when we set --discard
> during second volume creation.

Yes, ok. I'll have this fixed in v2.

>> +	unsigned long buf[4096 / sizeof(unsigned long)];
>> +	unsigned long i;
>> +	int fd;
>> +
>> +	fd = open(devname, O_RDWR | O_EXCL);
> 
> Do we need to to open another RDWR descriptor, I think that mdadm will open
> something again soon. We are opening descriptors many times, it generates
> unnecessary uvents. We can incorporate it with existing logic and add
> discarding just before st->ss->add_to_super() and discard every drive one by
> one.
> We will fail on error anyway.

Ok, that makes sense and cleans things up a bit. This will be included
in v2.
>> +	for (i = 0; i < ARRAY_SIZE(buf); i++) {
>> +		if (buf[i]) {
>> +			pr_err("device did not read back zeros after discard
>> on '%s': %lx\n",
>> +			       devname, buf[i]);
>> +			goto out_err;
> Do we really need to print data here? Could you move it to debug?

This is an error message and describes why we will not be creating the
array (we error out in this case). So I think it is necessary to be a
pr_err().

>> +		if (c->verbose)
>> +			pr_err("discarding all data on: %s\n", dv->devname);
> I know that mdadm is printing messages to stderr mainly but for this one,
> stdout is enough IMO. I give it to you.

Switched to printf() for the informational message in v2.

>> +		if (first_missing >= s->raiddisks)
>> +			s->assume_clean = 1;
> 
> IMO missing drive has nothing to do with clean conception but please correct me
> I'm wrong. All assume_clean does is skipping resync after the creation and that
> is true even if some drives are missing. Array will be still clean but will be
> also degraded.
> 
> If I'm correct then it simplifies implementation because we can set
> s.assume_clean if s.discard is set too in mdadm.c.
> 
> If I'm wrong what is the point of discard when resync is started anyway with
> missing drive? Maybe those features should be mutually exclusive?

I think you are right. I've implemented v2 to just add a s.discard check
to all the cases where s.assume_clean is checked. Makes it simpler and
more clear, rather than changing assume_clean in mdadm.
>> +		case O(CREATE, Discard):
>> +			s.discard = 1;
> please use true.

Done for v2.

>>  	char	*bitmap_file;
>>  	int	assume_clean;
>> +	int	discard;
> 
> please use bool type.

Done for v2.

Logan