Hi Bob, On Fri, Sep 11, 2020 at 08:49:14AM -0400, Bob Peterson wrote: > ----- Original Message ----- > > On Fri, Sep 11, 2020 at 08:08:35AM -0400, Bob Peterson wrote: > > > ----- Original Message ----- > > > > On Thu, Sep 10, 2020 at 09:43:19PM +0200, Salvatore Bonaccorso wrote: > > > > > Hi, > > > > > > > > > > On Tue, Jun 23, 2020 at 09:57:50PM +0200, Greg Kroah-Hartman wrote: > > > > > > From: Bob Peterson <rpeterso@xxxxxxxxxx> > > > > > > > > > > > > [ Upstream commit 83d060ca8d90fa1e3feac227f995c013100862d3 ] > > > > > > > > > > > > Before this patch, transactions could be merged into the system > > > > > > transaction by function gfs2_merge_trans(), but the transaction ail > > > > > > lists were never merged. Because the ail flushing mechanism can run > > > > > > separately, bd elements can be attached to the transaction's buffer > > > > > > list during the transaction (trans_add_meta, etc) but quickly moved > > > > > > to its ail lists. Later, in function gfs2_trans_end, the transaction > > > > > > can be freed (by gfs2_trans_end) while it still has bd elements > > > > > > queued to its ail lists, which can cause it to either lose track of > > > > > > the bd elements altogether (memory leak) or worse, reference the bd > > > > > > elements after the parent transaction has been freed. > > > > > > > > > > > > Although I've not seen any serious consequences, the problem becomes > > > > > > apparent with the previous patch's addition of: > > > > > > > > > > > > gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); > > > > > > > > > > > > to function gfs2_trans_free(). > > > > > > > > > > > > This patch adds logic into gfs2_merge_trans() to move the merged > > > > > > transaction's ail lists to the sdp transaction. This prevents the > > > > > > use-after-free. To do this properly, we need to hold the ail lock, > > > > > > so we pass sdp into the function instead of the transaction itself. > > > > > > > > > > > > Signed-off-by: Bob Peterson <rpeterso@xxxxxxxxxx> > > > > > > Signed-off-by: Andreas Gruenbacher <agruenba@xxxxxxxxxx> > > > > > > Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> > > > (snip) > > > > > > > > > > In Debian two user confirmed issues on writing on a GFS2 partition > > > > > with this commit applied. The initial Debian report is at > > > > > https://bugs.debian.org/968567 and Daniel Craig reported it into > > > > > Bugzilla at https://bugzilla.kernel.org/show_bug.cgi?id=209217 . > > > > > > > > > > Writing to a gfs2 filesystem fails and results in a soft lookup of the > > > > > machine for kernels with that commit applied. I cannot reporduce the > > > > > issue myself due not having a respective setup available, but Daniel > > > > > described a minimal serieos of steps to reproduce the issue. > > > > > > > > > > This might affect as well other stable series where this commit was > > > > > applied, as there was a similar report for someone running 5.4.58 in > > > > > https://www.redhat.com/archives/linux-cluster/2020-August/msg00000.html > > > > > > > > Can you report this to the gfs2 developers? > > > > > > > > thanks, > > > > > > > > greg k-h > > > > > > Hi Greg, > > > > > > No need. The patch came from the gfs2 developers. I think he just wants > > > it added to a stable release. > > > > What commit needs to be added to a stable release? > > > > confused, > > > > greg k-h > > Sorry Greg, > > It's pretty early here and the caffeine hadn't quite hit my system. > The problem is most likely that 4.19.132 is missing this upstream patch: > > cbcc89b630447ec7836aa2b9242d9bb1725f5a61 > > I'm not sure how or why 83d060ca8d90fa1e3feac227f995c013100862d3 got > put into stable without a stable CC but cbcc89b6304 is definitely > required. > > I'd like to suggest Salvatore try cherry-picking this patch to see if > it fixes the problem, and if so, perhaps Greg can add it to stable. Thanks I will ask the affected users if they can test this (because as said I cannot myself in this case). If it is true that we need to cherry-pick as well cbcc89b630447ec7836aa2b9242d9bb1725f5a61, then all of v4.14.y, v4.19.y, v5.4.y would need to have it included as well (83d060ca8d90fa1e3feac227f995c013100862d3 was applied down to v4.14.186, v4.19.130, v5.4.49, v5.7.6 (EOL)). Regards, Salvatore