drm/radeon: add large PTE support for NI, SI and CIK v5
authorChristian König <christian.koenig@amd.com>
Sat, 10 May 2014 10:17:55 +0000 (12:17 +0200)
committerAlex Deucher <alexander.deucher@amd.com>
Mon, 2 Jun 2014 14:25:02 +0000 (10:25 -0400)
This patch implements support for VRAM page table entry compression.
PTE construction is enhanced to identify physically contiguous page
ranges and mark them in the PTE fragment field. L1/L2 TLB support is
enabled for 64KB (SI/CIK) and 256KB (NI) PTE fragments, significantly
improving TLB utilization for VRAM allocations.

Linear store bandwidth is improved from 60GB/s to 125GB/s on Pitcairn.
Unigine Heaven 3.0 sees an average improvement from 24.7 to 27.7 FPS
on default settings at 1920x1200 resolution with vsync disabled.

See main comment in radeon_vm.c for a technical description.

v2 (chk): rebased and simplified.
v3 (chk): add missing hw setup
v4 (chk): rebased on current drm-fixes-3.15
v5 (chk): fix comments and commit text

Signed-off-by: Jay Cornwall <jay@jcornwall.me>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/radeon/cik.c
drivers/gpu/drm/radeon/ni.c
drivers/gpu/drm/radeon/radeon.h
drivers/gpu/drm/radeon/radeon_vm.c
drivers/gpu/drm/radeon/si.c

index 199eb19..dbb5b2e 100644 (file)
@@ -5328,6 +5328,7 @@ static int cik_pcie_gart_enable(struct radeon_device *rdev)
        WREG32(MC_VM_MX_L1_TLB_CNTL,
               (0xA << 7) |
               ENABLE_L1_TLB |
+              ENABLE_L1_FRAGMENT_PROCESSING |
               SYSTEM_ACCESS_MODE_NOT_IN_SYS |
               ENABLE_ADVANCED_DRIVER_MODEL |
               SYSTEM_APERTURE_UNMAPPED_ACCESS_PASS_THRU);
@@ -5340,7 +5341,8 @@ static int cik_pcie_gart_enable(struct radeon_device *rdev)
               CONTEXT1_IDENTITY_ACCESS_MODE(1));
        WREG32(VM_L2_CNTL2, INVALIDATE_ALL_L1_TLBS | INVALIDATE_L2_CACHE);
        WREG32(VM_L2_CNTL3, L2_CACHE_BIGK_ASSOCIATIVITY |
-              L2_CACHE_BIGK_FRAGMENT_SIZE(6));
+              BANK_SELECT(4) |
+              L2_CACHE_BIGK_FRAGMENT_SIZE(4));
        /* setup context0 */
        WREG32(VM_CONTEXT0_PAGE_TABLE_START_ADDR, rdev->mc.gtt_start >> 12);
        WREG32(VM_CONTEXT0_PAGE_TABLE_END_ADDR, rdev->mc.gtt_end >> 12);
Simple merge
Simple merge
Simple merge
Simple merge