Re: Ceph broken on big-endian systems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ulrich,

On Thu, 18 Jul 2019, Ulrich Weigand wrote:
> Hello,
> 
> we've been trying to get Ceph to work on IBM Z, a big-endian system,
> and have been running into various serious issues relating to endian
> conversion code.
> 
> The main issue we've been seeing is that while the old-style
> decode/encode machinery in include/encoding.h automatically
> byte-swaps all integers on big-endian systems (to ensure the
> serialized format is always little endian), the *new-style*
> machinery in include/denc.h does not.
> 
> This seemed confusing at first since there is quite a bit of code
> there that appears intended to perform exactly that function, e.g.
> 
> template<typename T>
> struct ExtType<T, std::enable_if_t<std::is_same_v<T, int32_t> ||
>                                    std::is_same_v<T, uint32_t>>> {
>   using type = __le32;
> };
> 
> However, it turns out that at this point __le32 is actually
> just an alias for __u32, so this whole machinery doesn't
> really do anything at all.
> 
> Looking at the old code in encoding.h, I notice that it works
> similarly, but uses ceph_le32 instead of __le32.  The former
> is a C++ class that actually does perform byte-swap on access.
> 
> Even more confusing, there is this code in include/types.h:
> 
> // temporarily remap __le* to ceph_le* for benefit of shared
> kernel/userland headers
> #define __le16 ceph_le16
> #define __le32 ceph_le32
> #define __le64 ceph_le64
> #include "ceph_fs.h"
> #include "ceph_frag.h"
> #include "rbd_types.h"
> #undef __le16
> #undef __le32
> #undef __le64
> 
> which --sometimes-- redefines __le32 as ceph_le32, but those
> redefines are not active at the point denc.h is included.
> 
> So it would appear that the usage of __le32 in denc.h is incorrect,
> and this code should be using ceph_le32 instead.  Is this right?
> 
> But even so, grepping for __le32 throughout the code base shows
> quite a bit of additional places where it is used, most of which
> also appear to make the assumption that byte-swaps automatically
> happen.  In addition, there appear to be some places where e.g.
> ceph_fs.h is included directly, without going via types.h -- and
> in those places, we suddenly no longer get the byte-swaps ...
> 
> 
> Now I was wondering whether the best way forward might be to just
> have __le32 always be defined as ceph_le32 when compiling user
> space code.  But then I noticed that it used to be that way,
> but that was deliberated changed by this commit back in 2010:
> 
> commit 737b5043576153817a6b4195b292672585df10d3
> Author: Sage Weil <sage@xxxxxxxxxxxx>
> Date:   Fri May 7 13:45:00 2010 -0700
> 
>     endian: simplify __le* type hackery
> 
>     Instead of preventing linux/types.h from being included, instead name
>     our types ceph_le*, and remap using #define _only_ when including the
>     shared kernel/userspace headers.
> 
> 
> So I'm a bit at a loss to understand how all this is supposed to
> be working.  Any suggestions would be welcome -- we'd be willing
> to implement whatever's needed, but would like some guidance as
> to how the solution should look like ...

I think the cleanest thing is to s/__le/ceph_le/ everywhere *except* 
msgr.h, rados.h, and ceph_fs.h.  It seems like what happened was we mostly 
forgot to always use the ceph_ variant after 2010.

That should resolve the issue, right?

sage
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux