Copy Fail: why a local Linux bug matters in Kubernetes and CI/CD

On 30 April 2026, my Radar feed surfaced CERT-EU advisory 2026-005 concerning "Copy Fail", tracked as CVE-2026-31431.

That early signal mattered.

At first glance, this was "only" a local privilege escalation vulnerability in the Linux kernel. It was not described as a remote unauthenticated exploit. An attacker would need some form of local code execution first: a shell account, a compromised application, a build job, a container workload, or some other foothold on a vulnerable system.

But in modern infrastructure, that distinction is not always comforting.

For cloud providers, Kubernetes operators, CI/CD environments and platforms that execute untrusted or semi-trusted workloads, local code execution is often part of the operating model. Containers run code. Build systems run code. Automation pipelines run code. Customer workloads run code.

That is why a local privilege escalation vulnerability can quickly become a platform boundary problem.

The initial advisory described broad Linux exposure, a public proof-of-concept exploit, no broadly available fixed vendor kernel packages yet, and a clear recommendation to prioritise Kubernetes nodes and CI/CD runners exposed to untrusted workloads.

A few days later, the picture had already evolved.

Vendor mitigations and patched-kernel tracks started to appear. The operator conclusion, however, did not become less serious. If anything, it became clearer: this was not noise. It was exactly the kind of infrastructure-level signal that should move quickly from detection to human review, mitigation and patch planning.

What we knew on 30 April

CERT-EU described CVE-2026-31431 as a high-severity local privilege escalation flaw in the Linux kernel's algif_aead module, which is part of the kernel userspace crypto API through AF_ALG.

The vulnerability had a CVSS score of 7.8, classified as High.

The vulnerability affects kernels containing the in-place AEAD design introduced upstream in October 2017 (commit 72548b093ee3). CERT-EU lists verified-vulnerable kernel lines 6.12, 6.17 and 6.18, with examples confirmed on Ubuntu 6.17.0, Amazon Linux 6.18.8, and RHEL/SUSE 6.12.0. Ubuntu 26.04 LTS and later are confirmed unaffected. The upstream fix is mainline commit a664bf3d603d, committed 1 April 2026.

The advisory stated that the vulnerable code path originated from an in-place optimisation introduced in 2017. By chaining an AF_ALG socket operation with splice(), an unprivileged local user could perform a controlled 4-byte write to a page-cache-backed page, potentially targeting a setuid binary such as /usr/bin/su to obtain root.

CERT-EU also stated that, as of the advisory date, no distribution had shipped a fixed kernel package yet, although the upstream fix had been committed on 1 April 2026.

That is the kind of timing gap operators care about: public exploitability, broad exposure, upstream fix known, vendor packages still catching up.

The practical recommendation was therefore clear:

Act immediately, prioritise systems that execute untrusted workloads, and do not wait passively for normal patch cycles if interim mitigation is available.

What we know now

The issue is still local rather than remotely exploitable by itself.

That matters. It means Copy Fail is not equivalent to an unauthenticated internet-facing remote code execution vulnerability.

But "local" does not mean "low risk".

Microsoft's analysis describes the vulnerability as a bug in the Linux kernel crypto subsystem that can allow an attacker to corrupt the cache of any readable file, including setuid binaries. That corruption can result in execution with root privileges.

The important point is that the modification happens in memory, through the page cache. The file on disk may remain unchanged. That makes this different from a simple file tampering issue and can reduce the usefulness of traditional integrity checks that compare on-disk checksums.

The original research from Xint/Theori described Copy Fail as a deterministic bug: a controlled 4-byte write into the page cache of any readable file on the system. Their proof-of-concept was reported to work across multiple major Linux distributions, including Ubuntu, Amazon Linux, RHEL and SUSE, without per-distribution offsets or recompilation.

That reliability is what makes the vulnerability operationally uncomfortable.

Ubuntu has since published mitigation through kmod, disabling the affected algif_aead module while patched kernel packages are rolled out. Ubuntu also notes that in deployments without container workloads, the published exploit allows a local user to elevate to root. In container deployments that execute potentially malicious workloads, the vulnerability may facilitate container escape scenarios.

Xint/Theori has published only a host-only proof-of-concept so far, but explicitly framed it as "Part 1" and signalled that a follow-up will cover Kubernetes container escape. Ubuntu therefore correctly describes the currently published exploit as host privilege escalation, while both Ubuntu and CERT-EU highlight container and CI/CD environments as high-priority risk areas.

AlmaLinux has also published patched-kernel guidance and explicitly called out multi-tenant hosts, container build farms, CI runners and systems where untrusted users can get a shell.

So the correct operator wording has changed from:

No distribution has shipped a fixed kernel package yet.

to:

Check your vendor's current advisory immediately. Apply a patched kernel where available. Where a patched kernel is not yet available, or where rebooting cannot happen immediately, apply the vendor-supported mitigation.

That change is not a contradiction. It is how fast-moving vulnerability response works.

The first advisory is a snapshot. The operator response must keep moving.

Why this matters for cloud, Kubernetes and CI/CD

For an ordinary single-user server, the response is relatively straightforward: update, reboot, verify.

For infrastructure providers and platform operators, the risk model is different.

Copy Fail matters more where:

users can run build jobs;
containers may include untrusted or customer-supplied code;
application compromise can become local code execution;
shared hosts run multiple workloads;
automation systems execute third-party scripts;
developers or customers have low-privilege shell access;
setuid binaries exist inside reachable host contexts;
workload isolation depends on the host kernel behaving correctly.

That is why this should not be treated as a narrow Linux host issue only.

It is a workload isolation issue.

In Kubernetes and CI/CD environments, "local execution" may already exist by design. A malicious build job, a compromised container, a supply-chain payload or a low-privilege shell can be enough to bring a local privilege escalation vulnerability into scope.

The absence of a remote exploit does not make the issue harmless.

It means the vulnerability becomes dangerous when combined with the kind of execution paths modern platforms already expose.

What operators should do now

Start with inventory.

Do not only check internet-facing systems. This is a kernel vulnerability, so the relevant question is not only whether a host is reachable from the internet. The relevant question is whether code can execute locally on a vulnerable kernel.

Prioritise:

Kubernetes worker nodes;
CI/CD runners;
build hosts;
container hosts;
multi-tenant systems;
shared developer machines;
bastion hosts;
application servers where a web compromise could become local code execution;
systems with low-privilege users;
systems running customer or third-party workloads.

Check the currently running kernel:

uname -r

Check whether the affected module is loaded:

grep -E '^algif_aead ' /proc/modules || true

On Ubuntu systems, check whether the kmod mitigation has been applied:

dpkg -l kmod

Where vendor kernel packages are available, update and reboot. On Debian/Ubuntu-style systems:

sudo apt update
sudo apt upgrade
sudo reboot

On RHEL/AlmaLinux/Rocky-style systems, follow the vendor advisory and update the relevant kernel packages:

sudo dnf update 'kernel*'
sudo reboot

After reboot, verify the running kernel, not only the installed package inventory:

uname -r

Package inventory alone is not enough. A patched kernel package does not protect the host until the system is actually running the patched kernel.

Interim mitigation

Where a patched kernel is not yet available, or where rebooting cannot happen immediately, apply the vendor-supported mitigation.

A common interim mitigation is to prevent the affected module from loading:

echo "install algif_aead /bin/false" | sudo tee /etc/modprobe.d/manual-disable-algif_aead.conf
sudo rmmod algif_aead 2>/dev/null || true

Then verify whether the module is still loaded:

grep -E '^algif_aead ' /proc/modules && echo "still loaded" || echo "not loaded"

Ubuntu notes that rebooting ensures the mitigation is applied, but that unloading the module can be sufficient if rebooting is not immediately possible.

There is some regression risk. The mitigation disables a kernel module used for hardware-accelerated cryptography. Most applications should fall back to userspace cryptographic functions, but operators should still monitor affected systems after mitigation.

For Kubernetes and CI/CD environments, restrict workloads from creating AF_ALG sockets (address family 38) at the seccomp layer. CERT-EU references the Docker and Kubernetes seccomp profiles for this. Where operators already enforce a default-deny seccomp profile, AF_ALG is typically blocked – verify rather than assume.

This should be treated as a defence-in-depth step, not as a replacement for patching.

What not to overstate

Copy Fail should not be described as "every Linux server is instantly rootable from the internet".

That is not accurate.

The attacker needs local code execution first.

But it should also not be dismissed because it is "only local".

In modern infrastructure, local execution is often the second step after an application compromise. It is also the normal execution model for containers, build jobs and automation pipelines.

A reliable local root path can turn a contained incident into host compromise. On shared infrastructure, that can become a tenant-boundary problem. In Kubernetes, it can become a node-integrity problem. In CI/CD, it can become a build-environment trust problem.

The severity is therefore environment-dependent.

For ordinary systems, this is a serious patching task.

For cloud, Kubernetes and CI/CD environments, it is an urgent host-integrity and workload-isolation issue.

What follow-up looks like

Security advisories are snapshots. Vendor status changes, mitigations appear, kernel packages roll out, and exploitation risk shifts as proof-of-concepts spread and defenders respond. The first advisory tells you what to look at; the human follow-up is what closes the loop:

verify exposure;
identify high-risk workload environments;
apply vendor mitigations where needed;
patch kernels where available;
reboot into fixed kernels;
restrict risky workload capabilities where possible;
confirm the running state after remediation.

A local kernel bug is not always a local problem.

In cloud, CI/CD and container environments, it can become a platform boundary problem very quickly.

References

Tags Cybersecurity Infrastructure Cloud