[ Moving from ceph-devel@vger to dev@ceph ] Do we need to change the protocol so you can provide a limit on the number of extents when you send a sparse read op? I rather suspect the workloads generating this are the worst case scenario and that we may want server-side limits in any case On Mon, Nov 6, 2023 at 5:02 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > On 11/6/23 20:32, Ilya Dryomov wrote: > > On Mon, Nov 6, 2023 at 1:15 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > >> > >> On 11/6/23 19:38, Ilya Dryomov wrote: > >>> On Mon, Nov 6, 2023 at 1:14 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > >>>> On 11/3/23 18:07, Ilya Dryomov wrote: > >>>> > >>>> On Fri, Nov 3, 2023 at 4:41 AM <xiubli@xxxxxxxxxx> wrote: > >>>> > >>>> From: Xiubo Li <xiubli@xxxxxxxxxx> > >>>> > >>>> There is no any limit for the extent array size and it's possible > >>>> that when reading with a large size contents. Else the messager > >>>> will fail by reseting the connection and keeps resending the inflight > >>>> IOs. > >>>> > >>>> URL: https://tracker.ceph.com/issues/62081 > >>>> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > >>>> --- > >>>> net/ceph/osd_client.c | 12 ------------ > >>>> 1 file changed, 12 deletions(-) > >>>> > >>>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c > >>>> index 7af35106acaf..177a1d92c517 100644 > >>>> --- a/net/ceph/osd_client.c > >>>> +++ b/net/ceph/osd_client.c > >>>> @@ -5850,8 +5850,6 @@ static inline void convert_extent_map(struct ceph_sparse_read *sr) > >>>> } > >>>> #endif > >>>> > >>>> -#define MAX_EXTENTS 4096 > >>>> - > >>>> static int osd_sparse_read(struct ceph_connection *con, > >>>> struct ceph_msg_data_cursor *cursor, > >>>> char **pbuf) > >>>> @@ -5882,16 +5880,6 @@ static int osd_sparse_read(struct ceph_connection *con, > >>>> > >>>> if (count > 0) { > >>>> if (!sr->sr_extent || count > sr->sr_ext_len) { > >>>> - /* > >>>> - * Apply a hard cap to the number of extents. > >>>> - * If we have more, assume something is wrong. > >>>> - */ > >>>> - if (count > MAX_EXTENTS) { > >>>> - dout("%s: OSD returned 0x%x extents in a single reply!\n", > >>>> - __func__, count); > >>>> - return -EREMOTEIO; > >>>> - } > >>>> - > >>>> /* no extent array provided, or too short */ > >>>> kfree(sr->sr_extent); > >>>> sr->sr_extent = kmalloc_array(count, > >>>> -- > >>>> 2.39.1 > >>>> > >>>> Hi Xiubo, > >>>> > >>>> As noted in the tracker ticket, there are many "sanity" limits like > >>>> that in the messenger and other parts of the kernel client. First, > >>>> let's change that dout to pr_warn_ratelimited so that it's immediately > >>>> clear what is going on. > >>>> > >>>> Sounds good to me. > >>>> > >>>> Then, if the limit actually gets hit, let's > >>>> dig into why and see if it can be increased rather than just removed. > >>>> > >>>> The test result in https://tracker.ceph.com/issues/62081#note-16 is that I just changed the 'len' to 5000 in ceph PR https://github.com/ceph/ceph/pull/54301 to emulate a random writes to a file and then tries to read with a large size: > >>>> > >>>> [ RUN ] LibRadosIoPP.SparseReadExtentArrayOpPP > >>>> extents array size : 4297 > >>>> > >>>> For the 'extents array size' it could reach up to a very large number, then what it should be ? Any idea ? > >>> Hi Xiubo, > >>> > >>> I don't think it can be a very large number in practice. > >>> > >>> CephFS uses sparse reads only in the fscrypt case, right? > >> Yeah, it is. > >> > >> > >>> With > >>> fscrypt, sub-4K writes to the object store aren't possible, right? > >> Yeah. > >> > >> > >>> If the answer to both is yes, then the maximum number of extents > >>> would be > >>> > >>> 64M (CEPH_MSG_MAX_DATA_LEN) / 4K = 16384 > >>> > >>> even if the object store does tracking at byte granularity. > >> So yeah, just for the fscrypt case if we set the max extent number to > >> 16384 it should be fine. But the sparse read could also be enabled in > >> non-fscrypt case. > > Ah, I see that it's also exposed as a mount option. If we expect > > people to use it, then dropping MAX_EXTENTS altogether might be the > > best approach after all -- it doesn't make sense to warn about > > something that we don't really control. > > > > How about printing the number of extents only if the allocation fails? > > Something like: > > > > if (!sr->sr_extent) { > > pr_err("failed to allocate %u sparse read extents\n", count); > > return -ENOMEM; > > } > > Yeah, this also looks good to me. > > I will fix it. > > Thanks > > - Xiubo > > > > Thanks, > > > > Ilya > > > > _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx