On Thu 13-08-15 11:29:40, Dave Chinner wrote: > On Wed, Aug 12, 2015 at 03:53:14PM +0200, Jan Kara wrote: > > struct xfs_attr_leafblock contains 'entries' array which is declared > > with size 1 altough it can in fact contain much more entries. Since this > > array is followed by further struct members, gcc (at least in version > > 4.8.3) thinks that the delared size of the array is the real one and > > thus optimizes away all accesses beyond the end of array resulting in > > non-working code. > > What are the symptoms displayed by this "non-working code"? I'm > using 4.9.3 (more recent than 4.8.3) for all my kernel and xfsprogs > testing, and I'm not seeing any issues.... Sorry for not revealing all the details. The real failure actually happened with somewhat older version of xfsprogs (3.1.8) to which I was backporting xfs_metadump fixes. The manifestation of the problem was that loop in process_attr_block() was taken only once instead of being taken 'nentries' times. Somehow the code in current xfsprogs is reorganized enough that the same compiler didn't decide to do the optimization. Not sure why but certainly the compiler is allowed to optimize away accesses beyond end of array since they are undefined and gcc heuristics on identifying "struct hack" (flexible sized array) don't trigger when there are further elements in the struct. > > Signed-off-by: Jan Kara <jack@xxxxxxxx> > > --- > > include/xfs_da_format.h | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/include/xfs_da_format.h b/include/xfs_da_format.h > > index 11f142078e12..39bfeb042844 100644 > > --- a/include/xfs_da_format.h > > +++ b/include/xfs_da_format.h > > @@ -1180,8 +1180,14 @@ typedef struct xfs_attr_leaf_name_remote { > > typedef struct xfs_attr_leafblock { > > xfs_attr_leaf_hdr_t hdr; /* constant-structure header block */ > > xfs_attr_leaf_entry_t entries[1]; /* sorted on key, not name */ > > + /* > > + * Definitions below are commented out so that gcc doesn't optimize > > + * away accesses into 'entries' for indexes larger than 1 > > + */ > > +#if 0 > > xfs_attr_leaf_name_local_t namelist; /* grows from bottom of buf */ > > xfs_attr_leaf_name_remote_t valuelist; /* grows from bottom of buf */ > > +#endif > > } xfs_attr_leafblock_t; > > So this same declaration is in the kernel code. > > FWIW, if you look at the struct xfs_attr3_leafblock, this > structure is declared like this: > > struct xfs_attr3_leafblock { > struct xfs_attr3_leaf_hdr hdr; > struct xfs_attr_leaf_entry entries[1]; > > /* > * The rest of the block contains the following structures after the > * leaf entries, growing from the bottom up. The variables are never > * referenced, the locations accessed purely from helper functions. > * > * struct xfs_attr_leaf_name_local > * struct xfs_attr_leaf_name_remote > */ > }; > > This situation is the same - the xfs_attr_leafblock variables are > only accessed by helper functions that return pointers. Can you > modify the declaration to match the xfs_attr3_leafblock declaration? Sure I didn't notice the kernel has done it differently. I'll send an updated patch. > FWIW, if gcc 4.8.3 is producing non-working xfsprogs code due to > these declarations, then I have to wonder if it is also generating > bad kernel code. There's also a bigger issue: XFS uses the array[1] > declaration technique for several variable size structures across > the code base. Are all of these declarations resulting in gcc 4.8.3 > doing the wrong thing? Or does the problem only manifest when the > array definition is followed by more variables? So using array[1] for flexible sized array has always been undefined according to C standard but people use it and compiler people kept it working. But it has always been a fuzzy thing - apparently there are heuristics in gcc determining whether array[1] is actually a flexible size array (accesses beyond end are kept) or normal array (accesses beyond end are undefined). When the array is at the end of struct, I'd hope it is always considered a flexible size array (otherwise too much code would break and we'd notice quickly). When the array is in the middle of the struct I can understand that the compiler decided this isn't a flexible size array and started to optimize. > If the problem does not manifest in the second case, why is this > inconsistency not considered a broken compiler optimisation? > > Hence I'm guessing we need to convert all the "array[1]" > definitions for variable sized arrays in XFS structures to "array[]" > to avoid gcc making wrong assumptions about the code? i.e: all of > these in kernel space: Using array[] is definitely a safer option than array[1] since there the behavior is well defined by C99 standard and you don't have to rely on gcc heuristics (similarly safe is also array[0]). > $ git grep "\[1\];" fs/xfs/*.h fs/xfs/*/*.h > fs/xfs/libxfs/xfs_attr_sf.h: __uint8_t nameval[1]; /* name & value bytes concatenated */ > fs/xfs/libxfs/xfs_attr_sf.h: } list[1]; /* variable sized array */ This one is interesting but likely OK... > fs/xfs/libxfs/xfs_da_format.h: __u8 nameval[1]; /* name/value bytes */ > fs/xfs/libxfs/xfs_da_format.h: __u8 name[1]; /* name bytes */ These two are safe. > fs/xfs/libxfs/xfs_da_format.h: xfs_attr_leaf_entry_t entries[1]; /* sorted on key, not name */ Here gcc could optimize I think. > fs/xfs/libxfs/xfs_da_format.h: struct xfs_attr_leaf_entry entries[1]; > fs/xfs/libxfs/xfs_log_format.h: xfs_extent_t efi_extents[1]; /* array of extents to free */ > fs/xfs/libxfs/xfs_log_format.h: xfs_extent_32_t efi_extents[1]; /* array of extents to free */ > fs/xfs/libxfs/xfs_log_format.h: xfs_extent_64_t efi_extents[1]; /* array of extents to free */ > fs/xfs/libxfs/xfs_log_format.h: xfs_extent_t efd_extents[1]; /* array of extents freed */ > fs/xfs/libxfs/xfs_log_format.h: xfs_extent_32_t efd_extents[1]; /* array of extents freed */ > fs/xfs/libxfs/xfs_log_format.h: xfs_extent_64_t efd_extents[1]; /* array of extents freed */ > fs/xfs/xfs_attr.h: __s32 al_offset[1]; /* byte offsets of attrs [var-sized] */ > fs/xfs/xfs_attr.h: char a_name[1]; These are all safe. > ---- > > FWIW, I'm getting quite worried by the reports over the past year or > so about different versions of gcc compilers that produce > "non-working" XFS code. e.g. the ARM do_div() kernel bug that only > manifests on certain compiler versions with certain optimisation > options, the reports of certain versions of gcc intel->arm > cross-compilers producing broken code, etc. > > I used to not have to think about whether the compiler generated > good code or not - if we now have to start considering that the > compiler is equally suspect as a random user's hardware then we're > being put in a really, really bad situation here.... Yeah, compiler issues are bad (it took me the whole morning to figure this particular issue out). The issue is that we sometimes rely on undefined behavior and when compiler people decide how that behaves, we are screwed (and I'm sometimes surprised by the amount of things that are undefined by the standard). Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs