Hi Greg, Thanks for your quick reply. On Fri, Sep 11, 2020 at 01:58:16PM +0200, Greg Kroah-Hartman wrote: > On Thu, Sep 10, 2020 at 09:43:19PM +0200, Salvatore Bonaccorso wrote: > > Hi, > > > > On Tue, Jun 23, 2020 at 09:57:50PM +0200, Greg Kroah-Hartman wrote: > > > From: Bob Peterson <rpeterso@xxxxxxxxxx> > > > > > > [ Upstream commit 83d060ca8d90fa1e3feac227f995c013100862d3 ] > > > > > > Before this patch, transactions could be merged into the system > > > transaction by function gfs2_merge_trans(), but the transaction ail > > > lists were never merged. Because the ail flushing mechanism can run > > > separately, bd elements can be attached to the transaction's buffer > > > list during the transaction (trans_add_meta, etc) but quickly moved > > > to its ail lists. Later, in function gfs2_trans_end, the transaction > > > can be freed (by gfs2_trans_end) while it still has bd elements > > > queued to its ail lists, which can cause it to either lose track of > > > the bd elements altogether (memory leak) or worse, reference the bd > > > elements after the parent transaction has been freed. > > > > > > Although I've not seen any serious consequences, the problem becomes > > > apparent with the previous patch's addition of: > > > > > > gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); > > > > > > to function gfs2_trans_free(). > > > > > > This patch adds logic into gfs2_merge_trans() to move the merged > > > transaction's ail lists to the sdp transaction. This prevents the > > > use-after-free. To do this properly, we need to hold the ail lock, > > > so we pass sdp into the function instead of the transaction itself. > > > > > > Signed-off-by: Bob Peterson <rpeterso@xxxxxxxxxx> > > > Signed-off-by: Andreas Gruenbacher <agruenba@xxxxxxxxxx> > > > Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> > > > --- > > > fs/gfs2/log.c | 11 +++++++++-- > > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c > > > index d3f0612e33471..06752db213d21 100644 > > > --- a/fs/gfs2/log.c > > > +++ b/fs/gfs2/log.c > > > @@ -877,8 +877,10 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags) > > > * @new: New transaction to be merged > > > */ > > > > > > -static void gfs2_merge_trans(struct gfs2_trans *old, struct gfs2_trans *new) > > > +static void gfs2_merge_trans(struct gfs2_sbd *sdp, struct gfs2_trans *new) > > > { > > > + struct gfs2_trans *old = sdp->sd_log_tr; > > > + > > > WARN_ON_ONCE(!test_bit(TR_ATTACHED, &old->tr_flags)); > > > > > > old->tr_num_buf_new += new->tr_num_buf_new; > > > @@ -890,6 +892,11 @@ static void gfs2_merge_trans(struct gfs2_trans *old, struct gfs2_trans *new) > > > > > > list_splice_tail_init(&new->tr_databuf, &old->tr_databuf); > > > list_splice_tail_init(&new->tr_buf, &old->tr_buf); > > > + > > > + spin_lock(&sdp->sd_ail_lock); > > > + list_splice_tail_init(&new->tr_ail1_list, &old->tr_ail1_list); > > > + list_splice_tail_init(&new->tr_ail2_list, &old->tr_ail2_list); > > > + spin_unlock(&sdp->sd_ail_lock); > > > } > > > > > > static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr) > > > @@ -901,7 +908,7 @@ static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr) > > > gfs2_log_lock(sdp); > > > > > > if (sdp->sd_log_tr) { > > > - gfs2_merge_trans(sdp->sd_log_tr, tr); > > > + gfs2_merge_trans(sdp, tr); > > > } else if (tr->tr_num_buf_new || tr->tr_num_databuf_new) { > > > gfs2_assert_withdraw(sdp, test_bit(TR_ALLOCED, &tr->tr_flags)); > > > sdp->sd_log_tr = tr; > > > -- > > > 2.25.1 > > > > In Debian two user confirmed issues on writing on a GFS2 partition > > with this commit applied. The initial Debian report is at > > https://bugs.debian.org/968567 and Daniel Craig reported it into > > Bugzilla at https://bugzilla.kernel.org/show_bug.cgi?id=209217 . > > > > Writing to a gfs2 filesystem fails and results in a soft lookup of the > > machine for kernels with that commit applied. I cannot reporduce the > > issue myself due not having a respective setup available, but Daniel > > described a minimal serieos of steps to reproduce the issue. > > > > This might affect as well other stable series where this commit was > > applied, as there was a similar report for someone running 5.4.58 in > > https://www.redhat.com/archives/linux-cluster/2020-August/msg00000.html > > Can you report this to the gfs2 developers? Sure! Bob Peterson and Andreas Gruenbacher were already on the recipient list but I forgot cluster-devel@xxxxxxxxxx . I can send there a separate report as followup if still needed. Regards, Salvatore