Re: [PATCH] dm: remake of the verity target

Mikulas Patocka <mpatocka@xxxxxxxxxx> · Fri, 16 Mar 2012 21:16:00 -0400 (EDT)

Hi Will

On Wed, 14 Mar 2012, Will Drewry wrote:

> Hi Mikulas,
> 
> This is a nice rewrite and takes advantage of your dm-bufio layer. I
> wish it'd existed (and or we wrote it :) in 2009 when we started this
> work!  Some comments below:
> 
> > ---
> > +static void verity_prefetch_io(struct dm_verity *v, struct dm_verity_io *io)
> > +{
> > +       int i;
> > +       for (i = v->levels - 2; i >= 0; i--) {
> > +               sector_t hash_block_start;
> > +               sector_t hash_block_end;
> > +               verity_hash_at_level(v, io->block, i, &hash_block_start, NULL);
> > +               verity_hash_at_level(v, io->block + io->n_blocks - 1, i, &hash_block_end, NULL);
> > +               if (!i) {
> > +                       unsigned cluster = prefetch_cluster;
> > +        /* barrier to stop GCC from re-reading prefetch_cluster again */
> > +                       barrier();
> > +                       cluster >>= v->data_dev_block_bits;
> 
> Would:
>   unsigned cluster = prefetch_cluster >> v->data_dev_block_bits;
> not have similar behavior without a barrier?  (Yeah yeah I could
> compile and see, but I was curious if you already had.)
> 
> Since the max iterations here is 61 in a worst-case, I don't think
> it's a big deal to barrier() each time, just thought I'd ask.
> 
> > +                       if (unlikely(!cluster))
> > +                               goto no_prefetch_cluster;
> > +                       if (unlikely(cluster & (cluster - 1)))
> > +                               cluster = 1 << (fls(cluster) - 1);
> > +
> > +                       hash_block_start &= ~(sector_t)(cluster - 1);
> > +                       hash_block_end |= cluster - 1;
> > +                       if (unlikely(hash_block_end >= v->hash_blocks))
> > +                               hash_block_end = v->hash_blocks - 1;
> > +               }
> > +no_prefetch_cluster:
> > +               dm_bufio_prefetch(v->bufio, hash_block_start,
> > +                                       hash_block_end - hash_block_start + 1);

The problem here is this. If you look at the code, you think that after 
the clause "if (unlikely(!cluster)) goto no_prefetch_cluster;", cluster 
can't be zero. But this assumption is wrong. The C compiler is allowed to 
transform the above code into:

unsigned cluster;
if (!(prefetch_cluster >> v->data_dev_block_bits))
	goto no_prefetch_cluster;
cluster = prefetch_cluster >> v->data_dev_block_bits;
if (unlikely(cluster & (cluster - 1)))
	cluster = 1 << (fls(cluster) - 1);

I know it's suboptimal, but the C compiler is just allowed to perform this 
transformation. Now, if you know that "prefetch_cluster" can change 
asynchronously by another thread running simultaneously, the condition "if 
(!(prefetch_cluster >> v->data_dev_block_bits))" is useless --- 
prefetch_cluster may change just after this condition and we won't catch 
the zero value. (if the cluster value is zero in the above code, it ends 
up in hash_block_end being ORed with -1 and the prefetch goes wild over 
the whole hash device).

That's why I put that "barrier()" there. It would be better to declare 
"prefetch_cluster" as volatile, but the module param macros issue warnings 
if the variable is volatile.

Or maybe I can change it this way:
"unsigned cluster = *(volatile unsigned *)&prefetch_cluster", it could be 
better than the "barrier()".

> > +       case STATUSTYPE_TABLE:
> > +               DMEMIT("%u %s %s %llu %u %s ",
> > +                       0,
> > +                       v->data_dev->name,
> > +                       v->hash_dev->name,
> 
> I understand the new approach is to use major:minor instead of the
> device name.  I don't care which, but I believe agk@ requested that.

All the device mappers report dm_dev->name in their status routine, so I 
do it this way too.

> > +static int verity_ioctl(struct dm_target *ti, unsigned cmd,
> > +                       unsigned long arg)
> > +{
> > +       struct dm_verity *v = ti->private;
> > +       int r = 0;
> > +
> > +       if (v->data_start ||
> > +           ti->len != i_size_read(v->data_dev->bdev->bd_inode) >> SECTOR_SHIFT)
> > +               r = scsi_verify_blk_ioctl(NULL, cmd);
> > +
> 
> Is it worth supporting ioctl at all given these hoops?  Nothing stops
> a privileged user from directly running the ioctl on the underlying
> device/devices, it's just very inconvenient :)

I don't know. The other dm targets attempt to pass-thru ioctls too.

You need ioctl pass-thru if you want to run it over a cd-rom because 
the iso9660 filesystem needs to send an ioctl to find its superblock. 
Other than that I don't know if other filesystems need ioctls.

> > +       if (ti->len > i_size_read(v->data_dev->bdev->bd_inode) >> SECTOR_SHIFT) {
> > +               ti->error = "Data device si too small";
> 
> s/si/is
> 
> Should this also check ti->start + ti->len to ensure it isn't reading
> off the end or do you just rely on the requests failing?

ti->start is the offset in the target table --- so it shouldn't be checked 
here (for example, you can map a verity device having 1024 blocks to a 
sector offset 1000000 in the table --- so ti->start == 1000000 and ti->len 
== 1024 --- in this case, you have test that the underlying device has at 
least 1024 blocks, but you shouldn't test it for 1000000 sectors --- 
1000000 is offset in the table, not required device size.

But this reminds me that I had the size test wrong in verity_map ... 
fixed.

> > +MODULE_AUTHOR("Mikulas Patocka <mpatocka@xxxxxxxxxx>");
> 
> As per linux/module.h, I'd welcome additional authors as per the
> lkml/patch lineage:
> MODULE_AUTHOR("Mandeep Baines <msb@xxxxxxxxxxxx>");
> MODULE_AUTHOR("Will Drewry <wad@xxxxxxxxxxxx>");

OK, I added you there.

> Regardless, I'll just be happy to see this functionality merge.
> 
> > +MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
> > +MODULE_LICENSE("GPL");
> > +
> > Index: linux-3.3-rc6-fast/drivers/md/dm-bufio.c
> 
> This should be in a separate patch I think.

Yes, it is a separate patch.

> >                b->hold_count++;
> 
> Are these hold_counts safe on architectures with weak memory models?
> Should they be atomic_ts?   I haven't looked at them in context, but
> based on what I see here they make me a bit nervous.
> 
> Thanks for jumping in to the fray!  None of my comments are blocking,
> so I believe the following is appropriate (if not
> s/Signed-off/Reviewed-by/).
> 
> Signed-off-by: Will Drewry <wad@xxxxxxxxxxxx>
> 
> cheers!
> will

hold_count is read or changed only when we hold dm_bufio_client->lock, so 
it doesn't have to be atomic.

Mikulas
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel