On Wed, 2019-08-21 at 14:07 +0200, Ilya Dryomov wrote: > We can't rely on ->peer_features in calc_target() because it may be > called both when the OSD session is established and open and when it's > not. ->peer_features is not valid unless the OSD session is open. If > this happens on a PG split (pg_num increase), that could mean we don't > resend a request that should have been resent, hanging the client > indefinitely. > > In userspace this was fixed by looking at require_osd_release and > get_xinfo[osd].features fields of the osdmap. However these fields > belong to the OSD section of the osdmap, which the kernel doesn't > decode (only the client section is decoded). > > Instead, let's drop this feature check. It effectively checks for > luminous, so only pre-luminous OSDs would be affected in that on a PG > split the kernel might resend a request that should not have been > resent. Duplicates can occur in other scenarios, so both sides should > already be prepared for them: see dup/replay logic on the OSD side and > retry_attempt check on the client side. > > Cc: stable@xxxxxxxxxxxxxxx > Fixes: 7de030d6b10a ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT") > Reported-by: Jerry Lee <leisurelysw24@xxxxxxxxx> > Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx> > --- > net/ceph/osd_client.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c > index fed6b0334609..4e78d1ddd441 100644 > --- a/net/ceph/osd_client.c > +++ b/net/ceph/osd_client.c > @@ -1514,7 +1514,7 @@ static enum calc_target_result calc_target(struct ceph_osd_client *osdc, > struct ceph_osds up, acting; > bool force_resend = false; > bool unpaused = false; > - bool legacy_change; > + bool legacy_change = false; > bool split = false; > bool sort_bitwise = ceph_osdmap_flag(osdc, CEPH_OSDMAP_SORTBITWISE); > bool recovery_deletes = ceph_osdmap_flag(osdc, > @@ -1602,15 +1602,14 @@ static enum calc_target_result calc_target(struct ceph_osd_client *osdc, > t->osd = acting.primary; > } > > - if (unpaused || legacy_change || force_resend || > - (split && con && CEPH_HAVE_FEATURE(con->peer_features, > - RESEND_ON_SPLIT))) > + if (unpaused || legacy_change || force_resend || split) > ct_res = CALC_TARGET_NEED_RESEND; > else > ct_res = CALC_TARGET_NO_ACTION; > > out: > - dout("%s t %p -> ct_res %d osd %d\n", __func__, t, ct_res, t->osd); > + dout("%s t %p -> %d%d%d%d ct_res %d osd%d\n", __func__, t, unpaused, > + legacy_change, force_resend, split, ct_res, t->osd); > return ct_res; > } > Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>