This came up again in the dev summit at cephalocon, so I figure it's worth reviving this thread. First, I'll try to recap the situation (Ilya, feel free to correct me here). My understanding of the issue is that rbd has features (most notably encryption) which depend on the librados SPARSE_READ operation reflecting accurately which ranges have been written or trimmed at a 4k granularity. This appears to work correctly on replicated pools on bluestore, but erasure coded pools always return the full object contents up to the object size including regions the client has not written to. I don't think this was originally a guarantee of the interface. I think the original guarantee was simply that SPARSE_READ would return any non-zero regions, not that it was guaranteed not to return unwritten or trimmed regions. The OSD does not track this state above the ObjectStore layer -- SPARSE_READ and MAPEXT both rely directly on ObjectStore::fiemap. MAPEXT actually returns -ENOTSUPP on erasure coded pools. Adam: the observed behavior is that fiemap on bluestore does accurately reflect the client's written extents at a 4k granularity. Is that reliable, or is it a property of only some bluestore configurations? As it appears desirable that we actually guarantee this, we probably want to do two things: 1) codify this guarantee in the ObjectStore interface (4k in all cases?), and ensure that all configurations satisfy it going forward (including seastore) 2) update the ec implementation to track allocation at the granularity of an EC stripe. HashInfo is the natural place to put the information, probably? We'll need to also implement ZERO. Radek: I know you're looking into EC for crimson, perhaps you can evaluate how much work would be required here? -Sam On Mon, May 2, 2022 at 5:21 PM Sam Just <sjust@xxxxxxxxxx> wrote: > > I don't think fiemap was ever intended as anything more than an > optimization to permit a user to avoid transferring unnecessary > zeroes. SeaStore will probably not track sparseness at more than a 4k > granularity. I don't think the EC implementation is clever about > sparse reads/writes at all since that information would probably need > to be duplicated above the objectstore in the object_info. > -Sam > > On Mon, May 2, 2022 at 7:47 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > On Mon, 2022-05-02 at 16:41 +0200, Ilya Dryomov wrote: > > > On Mon, May 2, 2022 at 4:22 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > > > (sorry for the resend, but the first message got rejected by the list because it was from an unsubscribed address) > > > > > > > > On Mon, 2022-05-02 at 14:05 +0200, Ilya Dryomov wrote: > > > > > Hi Sam, > > > > > > > > > > I wanted to clarify ObjectStore::fiemap API and sparse-read OSD op > > > > > guarantees as this came up in Jeff's fscrypt work and just recently in > > > > > RBD as well. > > > > > > > > > > In fscrypt for kcephfs, Jeff has opted to use sparse-read to ensure > > > > > that file holes (which must contain all zeroes logically) don't get > > > > > "decrypted" into seemingly random junk. (Unlike ecryptfs, fscrypt > > > > > framework doesn't attempt to protect the information about existence > > > > > and location of holes in files, so logical holes generally correspond > > > > > to physical holes.) > > > > > > > > > > > > > The fscrypt client infrastructure generally prevents you from reading a > > > > file when you don't have the key, but you could always analyze the > > > > backing device and determine where the holes are. The situation with > > > > cephfs is analogous. > > > > > > Yup. > > > > > > > > > > > I imagine this is the same with ecryptfs though. I don't believe it > > > > fills in the holes when you do a write past the EOF either. Were you > > > > thinking of LUKS? That operates at the device level, so finding holes > > > > there is a much different matter. > > > > > > I'm pretty sure ecryptfs always fills holes by encrypting logical zeroes and > > > writing the resulting ciphertext out to the backing filesystem. Quoting the > > > FAQ: > > > > > > eCryptfs does not currently support sparse files. Sequences of encrypted > > > extents with all 0's could be interpreted as sparse regions in eCryptfs > > > without too much implementation complexity. However, this would open up > > > a possible attack vector, since the fact that certain segments of data are > > > all 0's could betray strategic information that the user does not > > > necessarily want to reveal to an attacker. For instance, if the attacker > > > knows that a certain database file with patient medical data keeps > > > information about viral infections in one region of the file and > > > information about diabetes in another section of the file, then the very > > > fact that the segment for viral infection data is populated with data at > > > all would reveal that the patient has a viral infection. > > > > > > > I stand corrected then! That tends to be pretty horrible for performance > > though. Prepare to wait for a while if you do create a file and then > > start writing at the 2G offset. > > > > In principle, we could also have the client fill in holes instead. It > > may be worthwhile to have a mode where it does that. That might alsogive > > us a way to support this on non-bluestore pools if it's not feasible to > > allow for sparseness there). > > -- > > Jeff Layton <jlayton@xxxxxxxxxx> > > _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx