Re: sparse-read OSD op guarantees

Jeff Layton <jlayton@xxxxxxxxxx> · Mon, 02 May 2022 10:22:08 -0400

(sorry for the resend, but the first message got rejected by the list because it was from an unsubscribed address)

On Mon, 2022-05-02 at 14:05 +0200, Ilya Dryomov wrote:
> Hi Sam,
> 
> I wanted to clarify ObjectStore::fiemap API and sparse-read OSD op
> guarantees as this came up in Jeff's fscrypt work and just recently in
> RBD as well.
> 
> In fscrypt for kcephfs, Jeff has opted to use sparse-read to ensure
> that file holes (which must contain all zeroes logically) don't get
> "decrypted" into seemingly random junk.  (Unlike ecryptfs, fscrypt
> framework doesn't attempt to protect the information about existence
> and location of holes in files, so logical holes generally correspond
> to physical holes.)
> 

The fscrypt client infrastructure generally prevents you from reading a
file when you don't have the key, but you could always analyze the
backing device and determine where the holes are. The situation with
cephfs is analogous.

I imagine this is the same with ecryptfs though. I don't believe it
fills in the holes when you do a write past the EOF either. Were you
thinking of LUKS? That operates at the device level, so finding holes
there is a much different matter.

> This seems to be working with Bluestore on replicated pools because it
> tracks the allocated extents at byte level, irrespective of the actual
> data block size (bluestore_min_alloc_size, etc).  However sparse-read
> is built on ObjectStore::fiemap and there is the following comment in
> src/os/ObjectStore.h on fiemap:
> 
>     * A non-enlightened implementation is free to return the extent
>     * (offset, len) as the sole extent.
> 
> Is that true?  If not, is the implementation required to track
> everything at byte level or is it allowed to pick a granularity?
> What is SeaStore going to do here?
> 
> Going further, on EC pools, sparse-read is currently converted into
> a regular (async) read right away which is the same as fiemap returning
> a sole extent as far as the client is concerned.  Can EC pool support
> be expected in the future?
> 
> The reason I'm asking is that I have always considered sparse-read to
> be a "read with a hint" operation, where the hint to skip unallocated
> extents can be ignored by the OSD.  The above use case goes against
> that as it requires precise sub-object extent mappings.  Putting EC
> pools aside for the moment, is sparse-read safe to use for this
> purpose?
> 

Thanks for bringing this up, Ilya. I saw those same comments and had
intended to circle back to it at some point.

We may need some way to gate fscrypt-ability of a cephfs on whether the
data pool is an appropriate flavor. It's less than ideal, but would
probably be an acceptable solution if we have to.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx