When using python3, git-p4 fails to handle data from perforce which is not valid UTF-8. In large repositories it's very likely that such data will exist - perforce itself does no validation of the data by default. Historically git-p4 has just passed whatever bytes it got from perforce into git. This seems like a sensible approach - git-p4 has no idea what encoding may have been used and it seems likely that different encodings are used within a repository. I was trying to do a more thorough job, moving more of git-p4 over to using bytes. Unfortunately the changes end up being large and hard to review. In most cases it's probably sufficient to just avoid decoding the commit messages. There have been a couple of previous proposals around trying to decode this data using a user-configured encoding: http://public-inbox.org/git/CAE5ih7-F9efsiV5AQmw3ocjiy+BT6ZAT5fA0Lx0OSkVTO8Kqjg@xxxxxxxxxxxxxx/T/ http://public-inbox.org/git/20210409153815.7joohvmlnh6itczc@tb-raspi4/T/