Implementing 1GB Transparent Huge Pages: A Developer's Step-by-Step Guide

By

Introduction

Transparent Huge Pages (THP) have long been a cornerstone of Linux memory management, enabling automatic promotion of memory allocations to huge pages (typically 2MB on x86). However, while hardware supports even larger 1GB (PUD-level) pages, transparently leveraging them has been deemed infeasible due to challenges in fragmentation, allocation latency, and page table complexity. Recent work by Usama Arif presented at the 2026 Linux Storage, Filesystem, Memory Management, and BPF Summit aims to change that. This guide walks through the conceptual steps required to evaluate and implement transparent 1GB huge pages on x86-64 Linux systems.

Implementing 1GB Transparent Huge Pages: A Developer's Step-by-Step Guide

What You Need

Step-by-Step Guide

Step 1: Understand Current THP Implementation

Before scaling to 1GB, review how THP works today. The default huge page size (PMD-level) is 2MB on x86. THP scans memory regions and attempts to collapse contiguous pages into a huge page using the khugepaged kernel thread. It operates on the PMD entry in the page table. Key files to study: mm/huge_memory.c, include/linux/huge_mm.h, and the transparent_hugepage documentation in the kernel tree.

Step 2: Verify Hardware and Firmware Support for 1GB Pages

While most CPUs advertise 1GB page capability, the memory controller and BIOS must also support it. Check the kernel boot log for messages like pud huge page support. Ensure your system has enough contiguous physical memory (1GB aligned and contiguous). Use tools like numactl to verify memory node layout.

Step 3: Modify Page Table Entry Handling for Pud-Level Huge Pages

The kernel currently handles PMD-level huge pages. To support PUD-level, you need to extend the page table manipulation functions. Key changes: add checks for PMD-level vs. PUD-level entries in functions like __pmd_alloc, pud_huge, and set_pud_at. Create a new huge page type (e.g., HPAGE_PUD_SHIFT) and adjust VM_HUGEPAGE flags accordingly. Update the follow_page_mask and hugetlb_fault paths to handle PUD entries.

Step 4: Manage Memory Compaction and Fragmentation

1GB allocations require 1GB of physically contiguous memory. The buddy allocator may not always provide this. Implement a new compaction target (like COMPACT_PUD) that tries to isolate and move pages to create large free blocks. Modify compaction.c to scan at the PUD level. Use movable zones or CMA (Contiguous Memory Allocator) as fallback. Tune the khugepaged sleep and scan rates to avoid excessive CPU usage.

Step 5: Implement Transparent Promotion to 1GB Pages

The core of the feature: detect when a process’s memory region can be upgraded from 4KB or 2MB pages to a 1GB huge page. This involves scanning PMD-level huge pages and checking if four consecutive 2MB pages are contiguous and aligned to a 1GB boundary. If so, collapse them into a single PUD entry. Add new kernel tunable /sys/kernel/mm/transparent_hugepage/defrag_pud and statistics under /proc/meminfo (e.g., AnonHugePud). You may also want to modify madvise to accept MADV_HUGEPAGE_PUD.

Step 6: Test and Benchmark the Implementation

Use synthetic workloads that allocate large memory regions (e.g., 4GB+). Monitor with perf stat, numastat, and custom tracepoints. Measure TLB miss reduction, page fault latency, and throughput for memory-intensive applications like databases or HPC. Compare against baseline (THP disabled, THP 2MB only). Expect regression in allocation latency but improvements in performance for workloads with large working sets. Also test stress conditions: memory pressure, swap, and NUMA balancing.

Tips and Best Practices

Ultimately, scaling transparent huge pages to 1GB is a significant but tractable kernel engineering challenge. By following these steps and engaging with the community, you can help bring the performance benefits of 1GB pages to mainstream Linux.

Tags:

Related Articles

Recommended

Discover More

7 Key Updates on the Revised REZ Transmission Route: Avoiding Caves and Winning LandholdersMay 2026 Android Updates: Key Changes and Enhancements ExplainedHow DTCC Engineered a 24/7 Tokenized Collateral Platform with ChainlinkNew Open Standard Aims to Make Web Blocks Interchangeable Across PlatformsMastering Ginger VS Grammarly: Which Grammar Checker is Better in (2022) ?