A key drawback in the use of full system virtualization is the performance penalty introduced by hypervisors. This problem is especially present on ARM, which has significantly higher overhead for some workloads compared to x86, due to differences in the hardware virtualization support. The key reason for the overhead on ARM is the need to multiplex kernel mode state between the hypervisor and VMs, which each run their own kernel. This talk will cover how we have redesigned and optimized KVM/ARM, resulting in an order of magnitude reduction in overhead, and resulted in less overhead than x86 on key hypervisor operations. Our optimizations rely on new hardware support in ARMv8.1, the Virtualization Host Extensions (VHE), but also support legacy hardware through invasive modifications to Linux to support running the kernel in the hypervisor-specific CPU mode, EL2.