Re: [PATCH] sd: disable logical block provisioning if 'lbpme' is not set

Jeff Mahoney <jeffm@xxxxxxxx> · Tue, 19 Feb 2019 10:07:26 -0500

On 2/18/19 5:48 AM, Jean Delvare wrote:
> Hi Martin,
> 
> On Fri, 2019-02-15 at 22:30 -0500, Martin K. Petersen wrote:
>> When evaluating the 'block limits' VPD page we need to check if the
>>> 'lbpme' (logical block provisioning management enable) bit is set in
>>> the READ CAPACITY (16) output.  If it isn't we can safely assume that
>>> we cannot use DISCARD on this device.
>>
>> That's not entirely correct. We still support devices which predate
>> logical block provisioning getting standardized in SBC. These devices
>> are typically driven by the zeroing method and don't have LBPME, nor the
>> LBP VPD page to key off of. Instead they are triggered by setting the
>> provisioning_mode via udev.
>>
>> I am not sure how many of these are still around, I am hoping very
>> few. But I'd prefer to not break anything.
> 
> So basically the condition "if (!sdkp->lbpme)" should be replaced by
> something like "if (LBP_VPD_page_is_supported && !sdkp->lbpme)"?
> 
> Unfortunately it does not look like the driver keeps memory of
> successfully executing read_capacity_16(). do we need a new flag in
> sdkp to remember read_capacity_16() success? Or can we just test
> sdkp->lbpvpd?
> 
>> Can you describe the specific problem your patch is aiming to address?
> 
> The problem that was reported to us by several customers and which this
> patch attempts to address is file system corruption. Specifically
> metadata corruption reported on XFS filesystems by
> xfs_agf_read_verify(). Investigation revealed that the problematic
> areas had been zeroed-out at the block device level. This was in Azure
> VM, using dm-stripe, with an fstrim service running automatically on a
> weekly basis (default SUSE setting, in case the system has one or more
> SSD).
> 
> The details are beyond me to be honest, but my limited understanding is
> that calling TRIM on virtual block devices for which TRIM makes no
> sense ended up zeroing storage areas and causing the corruption. Hannes
> or Jeff may be able to explain it better, hopefully.
> 

That's more or less how the problem presents itself, but it's not that
we issue an fstrim on a block and it gets zeroed.  That would be fine
from the file system perspective.  It's that the fstrim is issued and
unrelated blocks are zeroed.  Once I saw the storage layer was
responsible, I stopped digging and passed it to Hannes.

-Jeff

-- 
Jeff Mahoney
SUSE Labs