IOMMUs, "IO Memory Management Units", are hardware devices that translate device DMA addresses to machine addresses. Isolation capable IOMMUs perform a valuable system service, preventing rogue devices from performing errant or malicious DMAs, thereby substantially increasing the system's reliability and availability. Without an IOMMU, a peripheral device could be programmed to overwrite any part of the system's memory. An isolation capable IOMMU restricts a device so that it can only access parts of memory it has been explicitly granted access to. Operating systems utilize IOMMUs to isolate device drivers; hypervisors utilize IOMMUs to grant secure direct hardware access to virtual machines. With the imminent publication of the PCI-SIG's IO Virtualization standard, as well as Intel and AMD's introduction of isolation capable IOMMUs in all new servers, IOMMUs will become ubiquitous.
IOMMUs can impose a performance penalty due to the extra memory accesses required to perform DMA operations. The exact performance degradation depends on the IOMMU design, its caching architecture, the way it is programmed and the workload. In this paper, we present the performance characteristics of the Calgary and DART IOMMUs in Linux, both on bare metal and hypervisors. We measure the throughput and CPU utilization of several IO workloads with and without an IOMMU and analyze the results. We then discuss potential strategies for mitigating the IOMMU's costs. We conclude by presenting a set of optimizations we have implemented and the resulting performance improvements.
Joint work with Jimi Xenidis and Michal Ostrowski (IBM Research), Leendert van Doorn (AMD), Karl Rister and Alexis Bruemmer (IBM LTC).
Back to the Club's homepage