AMD Video Driver Issues

I have had some graphics hangs on my HP z640 workstation which seem to always be after about 4 days of uptime, in one instance running Debian kernel 6.16.12+deb14+1 I got the following kernel error:

kernel: amdgpu 0000:02:00.0: [drm] *ERROR* [CRTC:58:crtc-0] flip_done timed out

Then I got the following errors from kwin_wayland:

kwin_wayland_wrapper[19598]: kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver
kwin_wayland_wrapper[19598]: kwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues
kwin_wayland_wrapper[19598]: kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'

In another instance running Debian kernel 6.12.48+deb13 I got the kernel errors at the bottom of the post (not in the RSS feed).

A google result suggested putting the following on the kernel command line which has the downside of increasing the idle power, but given that it’s a low power GPU (that I selected when I was using a system without a PCIe power cable) a bit of extra power use shouldn’t matter much. But it didn’t seem to change anything.

amdgpu.runpm=0 amdgpu.dcdebugmask=0x10

I had tried out the Debian/Unstable kernel 6.16.12-2 which didn’t work with my USB speakers and had problems with the HDMI sound through my monitor but still had AMD GPU issues.

This all seemed to start with the PCIe errors being reported on this system [1]. So I’m now wondering if the PCIe errors were from the GPU not the socket/motherboard. The GPU in question is a Radeon RX560 4G which cost $246.75 back in about 2021 [2]. I could buy a new one of those on ebay for $149 or one of the faster AMD cards like Radeon RX570 that are around the same price. I probably have a Radeon R7 260X in my collection of spare parts that would do the job too (2G of VRAM is more than sufficient for my desktop computing needs).

Any suggestions on how I should proceed from here?

[419976.222647] amdgpu 0000:02:00.0: amdgpu: GPU fault detected: 146 0x0138482c
[419976.222659] amdgpu 0000:02:00.0: amdgpu:  for process mpv pid 141328 thread vo pid 141346
[419976.222662] amdgpu 0000:02:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00101427
[419976.222664] amdgpu 0000:02:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0404802C
[419976.222666] amdgpu 0000:02:00.0: amdgpu: VM fault (0x2c, vmid 2, pasid 32810) at page 1053735, read from 'TC0' (0x54433000) (72)
[419986.245051] amdgpu 0000:02:00.0: amdgpu: Dumping IP State
[419986.245061] amdgpu 0000:02:00.0: amdgpu: Dumping IP State Completed
[419986.255152] amdgpu 0000:02:00.0: amdgpu: ring gfx timeout, signaled seq=11839646, emitted seq=11839648
[419986.255158] amdgpu 0000:02:00.0: amdgpu: Process information: process mpv pid 141328 thread vo pid 141346
[419986.255209] amdgpu 0000:02:00.0: amdgpu: GPU reset begin!
[419986.503030] amdgpu: cp is busy, skip halt cp
[419986.658198] amdgpu: rlc is busy, skip halt rlc
[419986.659270] amdgpu 0000:02:00.0: amdgpu: BACO reset
[419986.884672] amdgpu 0000:02:00.0: amdgpu: GPU reset succeeded, trying to resume
[419986.885398] [drm] PCIE GART of 256M enabled (table at 0x000000F402000000).
[419986.885413] [drm] VRAM is lost due to GPU reset!
[419987.021051] [drm] UVD and UVD ENC initialized successfully.
[419987.120999] [drm] VCE initialized successfully.
[419987.193302] amdgpu 0000:02:00.0: amdgpu: GPU reset(1) succeeded!
[419987.194117] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[419997.509120] amdgpu 0000:02:00.0: amdgpu: Dumping IP State
[419997.509131] amdgpu 0000:02:00.0: amdgpu: Dumping IP State Completed
[419997.519145] amdgpu 0000:02:00.0: amdgpu: ring gfx timeout, signaled seq=11839650, emitted seq=11839652
[419997.519152] amdgpu 0000:02:00.0: amdgpu: Process information: process kwin_wayland pid 3577 thread kwin_wayla:cs0 pid 3615
[419997.519158] amdgpu 0000:02:00.0: amdgpu: GPU reset begin!
[419997.772966] amdgpu: cp is busy, skip halt cp
[419997.928138] amdgpu: rlc is busy, skip halt rlc
[419997.929165] amdgpu 0000:02:00.0: amdgpu: BACO reset
[419998.164705] amdgpu 0000:02:00.0: amdgpu: GPU reset succeeded, trying to resume
[419998.165412] [drm] PCIE GART of 256M enabled (table at 0x000000F402000000).
[419998.165427] [drm] VRAM is lost due to GPU reset!
[419998.311054] [drm] UVD and UVD ENC initialized successfully.
[419998.411006] [drm] VCE initialized successfully.
[419998.476272] amdgpu 0000:02:00.0: amdgpu: GPU reset(2) succeeded!
[419998.476363] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[420008.773202] amdgpu 0000:02:00.0: amdgpu: Dumping IP State
[420008.773212] amdgpu 0000:02:00.0: amdgpu: Dumping IP State Completed
[420008.773240] amdgpu 0000:02:00.0: amdgpu: ring gfx timeout, but soft recovered
=== the above sequence of 3 repeated many times (narrator's voice "but it did not recover") ===
[420130.933612] rfkill: input handler disabled
[420135.594195] rfkill: input handler enabled
[420145.734076] amdgpu 0000:02:00.0: amdgpu: Dumping IP State
[420145.734085] amdgpu 0000:02:00.0: amdgpu: Dumping IP State Completed
[420145.744099] amdgpu 0000:02:00.0: amdgpu: ring gfx timeout, signaled seq=11839790, emitted seq=11839792
[420145.744105] amdgpu 0000:02:00.0: amdgpu: Process information: process kwin_wayland pid 3577 thread kwin_wayla:cs0 pid 3615
[420145.744111] amdgpu 0000:02:00.0: amdgpu: GPU reset begin!

There were more kernel messages, but they were just repeats and after a certain stage there probably isn’t any more data worth getting.

etbe – Russell Coker

Archives

Categories

AMD Video Driver Issues

Leave a Reply

Archives

Email and RSS

etbe – Russell Coker

Archives

Categories

Tags

AMD Video Driver Issues

Leave a Reply

Archives

Email and RSS