Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1 Release

June 17, 2025

6 minute read time.

We are happy to announce the releases of Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1. In this blog we will outline how to get your hands on these releases, the new features of the products and some highlights of their performance.

Arm Performance Libraries 25.04

Arm Performance Libraries (ArmPL) provides optimized standard core math libraries for numerical applications on 64-bit Arm (AArch64) processors. It includes optimized implementations of BLAS, LAPACK, FFT, RNG and sparse linear algebra functions. These are built with OpenMP parallelism to maximize performance in multi-processor environments. In addition, it also includes high performing scalar and vector math.h routines through the libamath library.

An important new feature for users is that ArmPL can now be installed and updated using Linux package managers. For example, on Ubuntu 22 and Ubuntu 24 you can now do:

sudo apt update
. /etc/os-release
curl "https://developer.arm.com/packages/arm-toolchains:${NAME,,}-${VERSION_ID/%.*/}/${VERSION_CODENAME}/Release.key" | sudo tee /etc/apt/trusted.gpg.d/developer-arm-com.asc
echo "deb https://developer.arm.com/packages/arm-toolchains:${NAME,,}-${VERSION_ID/%.*/}/${VERSION_CODENAME}/ ./" | sudo tee /etc/apt/sources.list.d/developer-arm-com.list
sudo apt update
sudo apt install arm-performance-libraries

For specific instructions for different Linux distributions see the 25.04 release note. The version available from the package manager is compatible with GCC, NVHPC, LLVM, and Arm Toolchain for Linux (ATfL). The existing ArmPL download page continues to provide all versions. Note this includes downloads compatible with the legacy Arm Compiler for Linux (ACfL) 24.10 for those users not ready to move their Fortran codes to the new Arm Toolchain for Linux compiler, detailed below.

On macOS ArmPL is already available through the arm-performance-libraries package on Homebrew. For Windows users winget similarly now has a package called Arm.ArmPerformanceLibraries. For Windows users using GitHub actions in their CI, we recommend directly using "msiexec /quiet ACCEPT_EULA=1" on the downloaded package.

ArmPL 25.04 features new functionality and performance improvements across the various constituent components. In the following sections, we highlight some of these for random number generation and maths functions. Other changes are described in the full release note.

New random number generation functionality

ArmPL 25.04 adds MT2203 and ARS5 random number generators (RNGs) to its suite of generators. These are both designed for use in parallel applications such as large-scale simulations, scientific modelling, and Monte Carlo methods. MT2203 can produce up to 6024 independent random number sequences from a set of 6024 Mersenne-Twister generators, while ARS5 is a counter-based RNG that uses the AES encryption algorithm. The source code for these new generators is also publicly available in OpenRNG 25.04 on Arm's Gitlab.

Choosing the accuracy mode of libamath, and performance improvements

In order to give users more control over the balance between accuracy and speed, libamath now has three accuracy modes: high (1.0 ULP), default (3.5 ULP), and low (half correct bits).

The high-accuracy functions are suffixed with _u10 and the low-accuracy versions with _umax. The default accuracy implementations have no suffix. Users can explicitly write intrinsics code to call these alternative accuracies. For example, to evaluate the single precision exponential of 0.1 using Neon you would use one of these three alternatives:

float32x4_t vxf = vdupq_n_f32(0.1f);
float32x4_t vef_u10  = armpl_vexpq_f32_u10(vxf);  // high accuracy (1.0 ULP)
float32x4_t vef_u35  = armpl_vexpq_f32(vxf);      // default accuracy (3.5 ULP)
float32x4_t vef_umax = armpl_vexpq_f32_umax(vxf); // low accuracy (half correct bits)

Maximum error estimates for math symbols in each mode are documented in the ArmPL Reference Guide.

The 25.04 release also adds the C23 vector routines tanpi(f), asinpi(f), acospi(f), atanpi(f), and atan2pi(f).

The new Linux build offers optimized versions of the following single and double precision functions in Neon and SVE:

	Single precision	Double precision
Neon	`expf`, `exp10f`, `asinf`, `atanf`, `atan2f`	`expm1`, `acos`, `atan`, `atan2`, `sinh`, `tanh`, `asinh`, `atanh`
SVE	`modff`, `expf`, `log1pf`, `atanf`, `coshf`	`pow`, `modf`, `acos`, `asin`, `atan`, a`tan2`

The graphs show the speed-ups achieved over using glibc 2.35 in both scalar and vector forms.

libamath results on Neoverse V1

Arm Toolchain for Linux 20.1

Introduction and positioning of ATfL

Arm Toolchain for Linux (ATfL) is Arm’s open-source compiler suite for AArch64, built on the modern LLVM ecosystem. It replaces the older Arm Compiler for Linux (ACfL), a downstream LLVM fork, with a toolchain that closely follows upstream LLVM development. The full source code is publicly available on GitHub at arm/arm-toolchain, enabling transparency and collaboration. The ATfL 20.1 release is based on LLVM 20.1.0, providing access to the latest compiler infrastructure, features, and improvements delivered by the LLVM community.

As Arm-based systems see increasing adoption across data centres and cloud platforms, ATfL provides a reliable and high-performance solution for developers targeting Armv8-A and Armv9-A Linux server systems. Whether you’re working on scientific computing, AI, or cloud-native services, ATfL is engineered to support modern, performance-critical workloads on Arm architecture. ATfL packages are available for major Linux distributions, making it easy to get started on commonly used platforms.

Features

The toolchain is optimized for Arm server-class processors like Neoverse, and provides comprehensive support for AArch64, including advanced vector extensions (SVE and SVE2). To enable optimized vector math routines, ATfL depends on ArmPL, which provide high-performance implementations of core mathematical functions.

ATfL includes armclang and armclang++ for C and C++ development, and armflang for Fortran, all based on LLVM’s Clang and Flang frontends. Detailed information on language standard support can be found in the C/C++/Fortran Standards Support section, and OpenMP support is described here. Note that OpenMP support in Flang is currently experimental.

Getting started and porting

Documentation and installation instructions are available on the Arm Developer website, including how to install using Linux package managers.

For developers transitioning from the ACfL, ATfL offers a familiar development experience with a more modern and upstream-aligned foundation. A key distinction is that ATfL adopts LLVM’s Flang frontend for Fortran, whereas ACfL was based on Classic Flang. Migrating to ATfL is recommended to take advantage of ongoing development and access to new features. For detailed guidance on porting from ACfL to ATfL, please refer to the porting guide.

Performance

We benchmarked the new ATfL 20.1 release — based on upstream LLVM — against the legacy ACfL 24.04 compiler on Neoverse V2 across key HPC and financial workloads. ATfL delivers solid gains on workloads like Open Radioss (+12%) and SNAP (+3%), and holds parity on widely used applications including Gromacs, LAMMPS, and Monte Carlo European.

We also benchmarked a wider set of workloads such as mini apps frequently used in HPC procurements. On some of these - notably in Thornado Mini and e3sm kernels — we see cases where the legacy ACfL compiler shows slightly better performance. These specific issues have been fixed and will reach parity with ACfL in the next release - ATfL 21 - later this year. We will continue to invest in the performance of LLVM for HPC, scientific and cloud workloads. If you have any feedback on the performance of ATfL, we'd be very interested in hearing from you. Please see our contribution guide on GitHub.

Summary

The latest versions of Arm Performance Libraries and Arm Toolchain for Linux are out now. These are ideal for accelerating HPC and Cloud workloads and continue the investment by Arm into these ecosystems. For users of macOS and Windows the Arm Performance Libraries release also includes optimized versions for these platforms. Download today, or better yet, set up your package managers to get all updates in future automatically!

Servers and Cloud Computing blog

How SiteMana scaled real-time visitor ingestion and ML inference by migrating to Arm-based AWS Graviton3

Peter Ma

Migrating to Arm-based AWS Graviton3 improved SiteMana’s scalability, latency, and costs while enabling real-time ML inference at scale.
- July 4, 2025
Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1 Release

Chris Goodyer

In this blog post, we announce the releases of Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1. Explore the new product features, performance highlights and how to get started.
- June 17, 2025
Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

Na Li

This blog explores the performance benefits of RAG and provides pointers for building a RAG application on Arm®︎ Neoverse-based Google Axion Processors for optimized AI workloads.
- April 7, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1 Release

How SiteMana scaled real-time visitor ingestion and ML inference by migrating to Arm-based AWS Graviton3

Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1 Release

Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors