Kernel panic when CPU is hot for long time
Hi everyone!
TLDR: T2 / PCH related kernel panic if CPU is hot for a long time, boots only after I let it cool down. No kernel panic during normal/light use. Possibly a faulty component on the motherboard that has a bad connection? Possible fix if opened?
EDIT: More tests in the comments
Long version:
I obtained a faulty 2020 iMac 5K with an i7-10700K and 5500XT to be used as a DIY 5K project base. The ad said that the GPU is faulty, it randomly restarts but during normal use, no problem.
The screen is the most important for me (only slight pink hue around the edge, no problem), I did not really care about the issue but here is what I discovered and it made me more interested in fixing the issue.
I benchmarked the GPU using Heaven Benchmark for 1-2 hours running at max fan speed, the GPU was at 80-90 degrees and it did not restart.
Then I benchmarked the CPU using Cinebench, survived 10 minute single-core but crashed 2-3 seconds after starting multi-core. Later when it cooled down, I tested the multi-core again and it lasted a lot longer but not 10 minutes.
When it restarts, sometimes it crashes on boot but mostly it gets to the login screen, can stay there for hours but after entering my password and it would start loading everything, it crashes until I let it cool down so it has a thermal headroom or something. Macs Fan Control turns up the fan speed immediately after login but still not early enough, I also turned off Intel Turbo Boost to decrease the temp generation.
The kernel panic logs (when present) show T2 / PCH / SEP related crashes (BAD MAGIC, x86 global reset detected - CORE 0 is the one that panicked / void AppleEmbeddedPCIeUpLinkMgmt::_linkInterruptAction(IOInterruptEventSource *, int): A link timeout has been seen after 650000 microseconds and 49999 iterations - CORE 0 is the one that panicked
But the weird thing is that I have been using this thing everyday for basic tasks, logging in, sleeping, passwd auth, everything seems to be working as usual. I guess normal tasks use the T2 as well but it does not heat up that much maybe?
What I'm planning to do in the coming weeks is to open it up, check visual defects on the motherboard, get an LGA1200 PC motherboard to test if the CPU is okay or not.
This whole issue seems to be only happening when the CPU is over 75-80 degrees for longer period of time when the nearby components are also heated up, I suspect a faulty connection somewhere that is when hot, not connecting correctly. Maybe the T2 chip's connection is bad or something?
What do you think, what would be the best steps to troubleshoot this issue? Is there a tool that only stress tests the T2 chip and not the CPU? Maybe a feature in macOS that really stresses that?
Thank you in advance!
Is this a good question?