Hi Martin,
On 11/21/2018 07:13 PM, Martin K. Petersen wrote:
Sorry about the delay. Travel got in the way.
No problem.
BDI_CAP_STABLE_WRITES should take care of this. What's the configuration
that fails?
Apologies, if the commit description sounds unfair. I did not mean to
blame anyone. It's just the collection of issues we saw in distros over
the years. Some of the old issues might be fixed with above zfcp patch
or common code changes. Unfortunately, I could not handle the DIX things
we saw. I think, DIF by itself provides a lot of the protection benefit
and was not affected by the encountered issues. We would like to give
users an easy way to operate in such setup.
I don't have a problem with zfcp having a parameter that affects the
host protection mask, the other drivers do that too. However, these
knobs exist exclusively for debugging and testing purposes. They are not
something regular users should twiddle to switch features on or off.
So DIF and DIX should always be enabled in the driver. And there is no
point in ever operating without DIF enabled if the hardware is capable.
Our long term plan is to make the new zfcp.dif (for DIF only) default to
enabled once we got enough experience about zfcp stability in this mode.
If there is a desire to disable DIX protection for whatever reason
(legacy code doing bad things), do so using the block layer sysfs
knobs. That's where the policy of whether to generate and verify
protection information resides, not in the HBA driver.
Yes, we came up with udev rules to set read_verify and write_generate to
0 in order to get DIF without DIX. However, this seems complicated for
users, especially since we always have at least dm-multipath and maybe
other dm targets such as LVM on top. The setting that matters is on the
top level block device of some dm (or maybe mdraid) virtual block device
stack. Getting this right, gets more complicated if there are also disks
not attached through zfcp, and which may need different settings, so the
udev rules would need somewhat involved matching. The new zfcp.dif
parameter makes it simpler because the SCSI disk comes up with the
desired limits and anything on top automatically inherits these block
queue limits.
There's one more important thing that has performance impact: We need to
pack payload and protection data into the same queue of limited length.
So for the worst case with DIX, we have to use half the size for
sg_tablesize to get the other half for sg_prot_tablesize. This limits
the maximum I/O request size and thus throughput. Using read_verify and
write_generate does not change the tablesizes, as zfcp would still
announce support for DIF and DIX. With the new zfcp.dif=1 and
zfcp.dix=0, we can use the full sg_tablesize for payload data and
sg_prot_tablesize=0. (The DIF "overhead" on the fibre still exists of
course.)
Are there other ways for accomplishing this which I'm not aware of?
And if there are unaddressed issues in the I/O stack that prevents you
from having integrity enabled, I'd prefer to know about them so they can
be fixed rather than circumventing them through driver module parameter.
Sure.
--
Mit freundlichen Gruessen / Kind regards
Steffen Maier
Linux on IBM Z Development
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294