Re: [git pull] Additional device mapper changes for 6.0

Mike Snitzer <snitzer@xxxxxxxxxx> · Sun, 7 Aug 2022 14:14:09 -0400

On Sun, Aug 07 2022 at  3:37P -0400,
Milan Broz <gmazyland@xxxxxxxxx> wrote:

> Hi,
> 
> Just a few notes on why we use target versions in libcryptsetup,
> as I am perhaps one user of this field there.
> 
> TL;DR: it is *only* for hinting to users what is possibly wrong
> after activation fails because there is *no* proper error reporting
> from the device-mapper.

DM's core and target versions aren't intended to be in service of
error reporting. You abusing them like that is a fundamental problem.

[[Unfortunate tangent but you've left me no choice:

Your general tone and misinformation-using-broad-strokes makes me both
sad and angry. I will restrain myself in this reply but your position
drips with general FUD and loathing. This is way more "Milan being
Milan" than I've ever experienced. Could be you've been storing it and
it all just gushed out, no idea. But it's a lot to try to take with
grace.

As you know I'm a very direct person. I speak my mind too. But I've
learned to try to avoid alarmist rhetoric that amounts to throwing
people(s) under the bus (better late than never). But if you're going
to resort that you better be _very_ certain it's justified. Yet as
cathartic as it might seem, even then it isn't the correct answer. If
you want to remain being respected please treat others with respect.

Only you know why you are flailing about with such an attitude, please
come to terms with that. I wish you well and certainly don't want DM
to be some constant or reoccurring source of such negativity (for you
or anyone).]]

> On 06/08/2022 20:36, Linus Torvalds wrote:
> > On Sat, Aug 6, 2022 at 11:30 AM Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
> ...
> > > Yes, I know you mentioned this before and I said I'd look to switch to
> > > feature bitmasks. Yet here we are. Sorry about that, but I will take
> > > a serious look at fixing this over the next development cycle(s).
> 
> Please don't just replace it with bitmaps.
> 
> It will not bring any better interface while adding more magic with
> handling compatibility, as we need to use both... see below.

(I saw your "below", it lacked a coherent explanation for why "we need
to use both" as a rule moving forward)

When done properly it will _not_ require both. The version number would
be incremented one final time and would serve to allow existing
userspace to run unmodified. But from that point on the bitmap flags
should be used and all userspace converted to use them.

> > Well, right now we're in the situation where there are certain kernels
> > that say that they implement "version 1.9" of the thing, but they
> > don't actually implement the "version 1.8.1" extensions.
> 
> I cannot speak for the others, but for veritysetup (libcryptsetup),
> the worst that can happen is that the user will get a wrong error message
> (or just a generic message "something failed, bye").

You know how to send email to report specific problems and/or submit
patches. But I really don't recall anything in this category being
reported by you, certainly not recently... maybe you've just
internalized or I somehow missed it?

> (All the crypto options are tricky, I would like to keep at least basic
> usability and better errors like "seems tasklets are not supported,
> retrying without tasklets flags.")

dm-verity's optional "try_verify_in_tasklet" is using tasklets as an
implementation detail, if they cannot be used (e.g. for FEC) then why
would fallback to normal verification using a workqueue be reported?

Or are you referring to something you saw when using dm-crypt's
no_{read,write}_workqueue options?

Or are you saying that both the new dm-verity try_verify_in_tasklet
option and the dm-crypt no_{read,write}_workqueue options should
fallback to removing those flags and try without them?

That is a level of AI I have no interest in adding or supporting.
The user asked for something, if it isn't possible then it should
fail.

But please be more specific.

> In principle, we use activation flags/options as Linus describes - try
> to set it, then deal with the failure.
>
> And *this* is the real problem that needs to be solved - there is no proper
> userspace interface that says what went wrong.
> 
> The userspace sees only -EINVAL from ioctl() and a generic message.

"Please extend the DM ioctls to somehow add ti->error to the userspace
response" is a fine feature request. Should help no matter what.

(Can look to have a phased approach to the error reporting payload,
start with errno and error message, add more "structured" payload over
time. Are you referring to JSON or some other format? Whatever systemd
uses?).

> Perhaps in the syslog is more info, but usually only at debug level
> (that is often not visible), and parsing syslog is not the option for us either.

All errors should be emitted with pr_err() using DMERR(). I've made a
conscious effort to convert DMWARN() to DMERR() when appropriate. But
I'll audit all the DM core code and then work through the various
targets.

If there are incorrect log levels being used it is a bug, please
report and/or fix.

> What is even more problematic is that the error string in DM target is
> often set (e.g. ti->error = "Integrity profile tag size mismatch.";) but later
> discarded, and it never reaches neither log nor userspace calling the failing
> ioctl().

Again, if you see a bug: please report and/or fix it.

> If the device-mapper can fix this, we can easily thrash the magic that
> consults the target version and determines what went wrong.

There is no way to properly use version numbers to derive what
actually went wrong. Could you narrow down and isolate the possible
failure based on version in specific cases? Sure.. but it is insanely
fragile (especially with stable@ and distro kernels).

> Then you can forget the version and feature bitmaps and send
> us a proper (ideally structured) error message in ioctl() reply.

OK, I can just avoid switching to feature bitmaps entirely, stop
bumping version numbers, and focus on better error reporting. Then all
of userspace can rely on errors reported to fail and inform user
actions.

But I'm pretty confident lvm2 will have something to say on
this... I'll take all input into consideration.

Mike