STRESS TESTING YOUR COMPUTER
BACKGROUND
----------
Today's computers are not perfect. Even brand new systems from major
manufacturers can have hidden flaws. If any of several key components such
as CPU, memory, cooling, etc. are not up to spec, it can lead to incorrect
calculations and/or unexplained system crashes.
Overclocking is the practice of increasing the speed of the CPU and/or
memory to make a machine faster at little cost. Typically, overclocking
involves pushing a machine past its limits and then backing off just a
little bit.
For these reasons, both non-overclockers and overclockers need programs
that test the stability of their computers. This is done by running
programs that put a heavy load on the computer. Though not originally
designed for this purpose, Prime95 is an excellent way of placing a heavy
load on a computer system and includes a "torture test" to allow users to do
so without interfering in the prime search.
RESOURCES
---------
This program is a good stress test primarily for CPUs and RAM, as well as
cooling systems and power supplies. The torture tests run continuously, checking
the calculations are within certain parameters while they are in progress and
comparing your computer's final results to results that are known to be correct.
Any mismatch and you've got a problem!
You'll need other programs to stress test other system components, monitor
the state of the system while testing and help identify the source of any faults
that arise.
Other stress tests for CPUs, memory, graphics cards etc:
OCCT -
OCBASE/OCCT : Free, all-in-one stability, stress test, benchmark and monitoring tool for your PC
Linpack -
Linpack Xtreme 1.1.5 Download
Realbench -
ROG - Republic of Gamers|Global | For Those Who Dare
MemTest64 -
TechPowerUp
FurMark -
FurMark Homepage
GpuTest -
GpuTest - Cross-Platform GPU Stress Test and OpenGL Benchmark for Windows, Linux and OS X | Geeks3D.com
PassMark BurnInTest -
PassMark BurnInTest software - PC Reliability and Load Testing
Aida64 -
https://www.aida64.com/
Utilities and monitoring software:
Intel XTU -
https://downloadcenter.intel.com/download/24075
Ryzen Master -
https://www.amd.com/en/technologies/ryzen-master
Afterburner -
https://www.msi.com/page/afterburner
HWiNFO -
https://www.hwinfo.com/
CPU-Z -
https://www.cpuid.com/softwares/cpu-z.html
GPU-Z -
https://www.techpowerup.com/gpuz/
Useful websites and forums with pertinent information:
http://www.overclockers.com
http://www.overclock.net
http://www.anandtech.com
http://www.tomshardware.com
http://www.hardocp.com
http://linustechtips.com/main/
http://ark.intel.com
http://www.amd.com/en/products/specifications/processors
A number of subreddits exist where assistance may be found - please make sure
you understand the scope of each and read their rules before posting:
http://www.reddit.com/r/pchelp
http://www.reddit.com/r/buildapchelp
http://www.reddit.com/r/techsupport
http://www.reddit.com/r/overclocking
http://www.reddit.com/r/Intel
http://www.reddit.com/r/AMD
WHAT TO DO IF A PROBLEM IS FOUND?
---------------------------------
The exact cause of a hardware problem can be very hard to find.
If you are not overclocking, the most likely cause is memory. It is not uncommon
for memory to not run correctly at its rated speed (incorrectly "binned"). This is
most easily tested by swapping it with memory from another compatible computer and
retesting. If that is not possible you can try underclocking memory or increasing
memory voltage a tiny bit. Overheating is another possible source of problems.
You can check the temperatures using monitoring software like HWiNFO to make sure
the CPU is below it's rated temperature limit. If not, the cooler may be incorrectly
mounted or disconnected from the system while in transit, or the thermal paste
between the CPU and the cooler may not have been applied properly - YouTube is an
excellent place to find videos demonstrating correct cooler/paste application methods.
Occasionally, the power supply is incapable of supplying sufficient power to the
system under heavy load, you can often diagnose this by monitoring the 12v, 5v and
3.3v voltages - you will typically observe a substantial drop in these voltages when
putting the system under load and generally means the PSU itself needs to be replaced
with a more capable unit.
If you are overclocking, the most likely problems are either the CPU core
voltage being set too low or drooping too far under heavy low. You should
either increase the voltage or adjust the load line calibration to deal
with these issues. Another frequently seen issue is the motherboard failing
to set a suitable voltage for the memory controller when an XMP profile is
enabled.
The above causes are far from a comprehensive list of possible causes. Diagnosing
the exact cause can be a very difficult process.
***NB:*** You should always thoroughly research the voltage tolerances of any
specific component before you start changing it. Memory controllers integrated
into modern CPUs in particular are very sensitive to increased running voltages
and can functionally degrade very quickly if set too high. Also make sure you
have accurate temperature monitoring in place while stress testing with increased
voltages and clock speeds as heat outputs increase exponentially - 1st generation
Ryzen CPUs specifically develop heat-related stability problems when running at
temperatures above 70degC.
CAN I IGNORE THE PROBLEM?
-------------------------
Ignoring the problem is a matter of personal preference. There are
two schools of thought on this subject.
Most programs you run will not stress your computer enough to cause a
wrong result or system crash. If you ignore the problem, then certain
workloads may stress your machine resulting in a system crash. Also,
stay away from distributed computing projects where an incorrect calculation
might cause you to return wrong results. Bad data will not help these
projects! In conclusion, if you are comfortable with a small risk of an
occasional system crash then feel free to live a little dangerously! Keep in
mind that the faster prime95 finds a hardware error the more likely it is that
other programs will experience problems.
The second school of thought is, "Why run a stress test if you are going
to ignore the results?" These people want a guaranteed 100% rock solid
machine. Passing these stability tests gives them the ability to run
CPU intensive programs with confidence.
FREQUENTLY ASKED QUESTIONS
--------------------------
Q) My machine is not overclocked. If I'm getting an error, then there must
be a bug in the program, right?
A) The torture test is comparing your machines results against
KNOWN CORRECT RESULTS. If your machine cannot generate correct
results, you have a hardware problem. HOWEVER, if you are failing
the torture test in the SAME SPOT with the SAME ERROR MESSAGE
every time, then ask for help at
http://mersenneforum.org - it is
possible that a recent change to the torture test code may have
introduced a software bug.
Q) How long should I run the torture test?
A) I recommend running it for somewhere between 6 and 24 hours.
The program has been known to fail only after several hours and in
some cases several weeks of operation. In most cases though, it will
fail within a few minutes on a flaky machine. When overclocking it is
entirely feasible to run short 10-15min tests at each increase in
clock speed to quickly assess the feasibility of running at those
speeds, then run longer tests later.
Q) Prime95 reports errors during the torture test, but other stability
tests don't. Do I have a problem?
A) Stability tests are not equal in their ability to detect problems.
Some don't apply a heavy enough load for their results to be reliable,
while others apply loads so heavy that only extremely edge-case
real-world workloads would compare. There also may be significant
differences between stress tests regarding the CPU features they make
use of, so specific tests that don't make use of them may not be valid
for specific use cases where those features are required (e.g. AVX
instructions, virtualization technologies).
Q) A forum member said "Don't bother with prime95, it always pukes on me,
and my system is stable!. What do you make of that?"
or
"We had a server at work that ran for 2 MONTHS straight, without a reboot
I installed Prime95 on it and ran it - a couple minutes later I get an error.
You are going to tell me that the server wasn't stable?"
A) If a system can be easily crashed or made to generate incorrect
results to mathematical functions simply by running a program on it,
it is impossible to argue that it is reliable. The consequences of
that unreliability are up to the user to be aware of, but many faults
go unnoticed for a long time. Glitches in games may be assumed to be
bugs in the programming. Vital data in long term storage may already
have been corrupted without anyone knowing about it. The question to
ask yourself is whether or not the responsibilities of that system are
unimportant enough to make continuing without remedial action worth
the risk.