RGW (Swift) failures during upgrade from Jewel to Luminous

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We recently began our upgrade testing for going from Jewel (10.2.10) to
Luminous (12.2.5) on our clusters.  The first part of the upgrade went
pretty smoothly (upgrading the mon nodes, adding the mgr nodes, upgrading
the OSD nodes), however, when we got to the RGWs we started seeing internal
server errors (500s) on the Jewel RGWs once the first RGW was upgraded to
Luminous.  Further testing found two different problems:

The first problem (internal server error) was seen when the container and
object were created by a Luminous RGW, but then a Jewel RGW attempted to
list the container.

The second problem (container appears to be empty) was seen when the
container was created by a Luminous RGW, an object was added using a Jewel
RGW, and then the container was listed by a Luminous RGW.

Here were all the tests I performed:

Test #1: Create container (Jewel),    Add object (Jewel),    List container (Jewel),    Result: Success
Test #2: Create container (Jewel),    Add object (Jewel),    List container (Luminous), Result: Success
Test #3: Create container (Jewel),    Add object (Luminous), List container (Jewel),    Result: Success
Test #4: Create container (Jewel),    Add object (Luminous), List container (Luminous), Result: Success
Test #5: Create container (Luminous), Add object (Jewel),    List container (Jewel),    Result: Success
Test #6: Create container (Luminous), Add object (Jewel),    List container (Luminous), Result: Failure (Container appears empty)
Test #7: Create container (Luminous), Add object (Luminous), List container (Jewel),    Result: Failure (Internal Server Error)
Test #8: Create container (Luminous), Add object (Luminous), List container (Luminous), Result: Success

It appears that we ran into these bugs because our load balancer was
alternating between the RGWs while they were running a mixture of the two
versions (like you would expect during an upgrade).

Has anyone run into this problem as well?  Is there a way to workaround it
besides disabling half the RGWs, upgrading that half, swinging all the
traffic to the upgraded RGWs, upgrading the other half, and then enabling
the second half?

Thanks,
Bryan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux