What's new
Carbonite

South Africa's Top Online Tech Classifieds!
Register a free account today to become a member! (No Under 18's)
Home of C.U.D.

Constant BSOD, I've run out of ideas, need tech shop in Cape Town

I suspect the PSU could be the culprit.

Had a somewhat similar issue with odd random bluescreens not as often as yours but my SATA SSD would disconnect and reconnect randomly as well.
My OS was on an NVME so the system would stay up most times when it happened.

After PSU replacement all issues with my SATA SSD and bluescreens disappeared , turns out it was the PSU all along and the only symptoms were varying bluescreens and my randomly disconnecting SATA SSD.
 
Is there a way to test a PSU? Mine should still be in warranty...
 
I moved the SATA cables around too, last night...
Did you swop the SATA cable and port as I mentioned earlier in the thread? We want to eliminate as many things as possible using the info you've provided. :)

Keep in mind if verifyer is still running, it will stress test your drivers and try to FORCE a bluescreen.
 
Is there a way to test a PSU? Mine should still be in warranty...
Yes, use a graphics intensive task like Furmark AND something like Prime95 or Intel Burn In test at the same time.

BTW - is your PC on a multiplug or plugged directly into the wall?
 
Did not swap sata cables, only changed the ports on the mobo.
Indeed, verifier is still running.
Multiplug... only a single plub point in the room :\
 
Did not swap sata cables, only changed the ports on the mobo.
Indeed, verifier is still running.
Multiplug... only a single plub point in the room :\
Remember Verifyer will force BSOD.

Lets see how it goes with the new SSD you ordered. :)
 
Ok, ran furmark and prime95 together, for about 46 minutes. PC didn't crash, but experienced some errors in Prime95...

[Feb 7 10:23] Test 1 (thread 2 of 2), 6000000 Lucas-Lehmer in-place iterations of M104799 using FMA3 FFT length 5K.
[Feb 7 10:24] FATAL ERROR: Rounding was 0.5, expected less than 0.4
[Feb 7 10:24] Hardware failure detected running 5K FFT size, consult stress.txt file.
[Feb 7 10:24] Test 2 (thread 1 of 2), 6000000 Lucas-Lehmer in-place iterations of M102991 using FMA3 FFT length 5K.
[Feb 7 10:25] FATAL ERROR: Rounding was 0.5, expected less than 0.4
[Feb 7 10:25] Hardware failure detected running 5K FFT size, consult stress.txt file.

[Feb 7 11:05] Test 2 (thread 1 of 2), 5000000 Lucas-Lehmer in-place iterations of M125281 using FMA3 FFT length 6K, Pass1=128, Pass2=48, clm=2.
[Feb 7 11:07] FATAL ERROR: Rounding was 2.608650448e+11, expected less than 0.4
[Feb 7 11:07] Hardware failure detected running 6K FFT size, consult stress.txt file.
 
also bit the bullet and ordered RAM on QVL list for my mobo, as well as new, bigger AIO... hardware monitor shows 91C max on CPU during stress tests....
 
also bit the bullet and ordered RAM on QVL list for my mobo, as well as new, bigger AIO... hardware monitor shows 91C max on CPU during stress tests....
Under stress testing, very high temps are to be expected :) Running a CPU + GPU stress test will also give some insight into case cooling and its efficiency to remove hot hair from your case, which is a good thing to do, irrespecitive of how basic or advanced your cooling solution is. :)

I personally would not have ordered RAM and a new cooler. :) Not until you've ruled out the PSU "always the last place you look". The errors in Prime95 *could maybe* be due to the CPU not receiving "smooth/clean" enough power OR to the fact that the P95 logs are writing to the SSD, which already shows signs of "extended" wear and tear. :)
 
I have not. I've had a previous experience where I may, or may not, have bent pins... not on this PC though.

I switched off verifier, restarted, and no BSOD yet...
 
Update. Almost 24 hours later, no BSOD.
I'm not getting my hopes up. My PC has been fine for a few days in a row before, and then it starts again.
 
prime95 said this: consult stress.txt file

Did you actually read the file to see where the errors were reported from?
 
prime95 said this: consult stress.txt file

Did you actually read the file to see where the errors were reported from?
Yes, I did. It's not a log, it's a readme type of document.
STRESS TESTING YOUR COMPUTER

BACKGROUND
----------

Today's computers are not perfect. Even brand new systems from major
manufacturers can have hidden flaws. If any of several key components such
as CPU, memory, cooling, etc. are not up to spec, it can lead to incorrect
calculations and/or unexplained system crashes.

Overclocking is the practice of increasing the speed of the CPU and/or
memory to make a machine faster at little cost. Typically, overclocking
involves pushing a machine past its limits and then backing off just a
little bit.

For these reasons, both non-overclockers and overclockers need programs
that test the stability of their computers. This is done by running
programs that put a heavy load on the computer. Though not originally
designed for this purpose, Prime95 is an excellent way of placing a heavy
load on a computer system and includes a "torture test" to allow users to do
so without interfering in the prime search.


RESOURCES
---------

This program is a good stress test primarily for CPUs and RAM, as well as
cooling systems and power supplies. The torture tests run continuously, checking
the calculations are within certain parameters while they are in progress and
comparing your computer's final results to results that are known to be correct.
Any mismatch and you've got a problem!

You'll need other programs to stress test other system components, monitor
the state of the system while testing and help identify the source of any faults
that arise.

Other stress tests for CPUs, memory, graphics cards etc:
OCCT - OCBASE/OCCT : Free, all-in-one stability, stress test, benchmark and monitoring tool for your PC
Linpack - Linpack Xtreme 1.1.5 Download
Realbench - ROG - Republic of Gamers|Global | For Those Who Dare
MemTest64 - TechPowerUp
FurMark - FurMark Homepage
GpuTest - GpuTest - Cross-Platform GPU Stress Test and OpenGL Benchmark for Windows, Linux and OS X | Geeks3D.com
PassMark BurnInTest - PassMark BurnInTest software - PC Reliability and Load Testing
Aida64 - https://www.aida64.com/

Utilities and monitoring software:
Intel XTU - https://downloadcenter.intel.com/download/24075
Ryzen Master - https://www.amd.com/en/technologies/ryzen-master
Afterburner - https://www.msi.com/page/afterburner
HWiNFO - https://www.hwinfo.com/
CPU-Z - https://www.cpuid.com/softwares/cpu-z.html
GPU-Z - https://www.techpowerup.com/gpuz/

Useful websites and forums with pertinent information:
http://www.overclockers.com
http://www.overclock.net
http://www.anandtech.com
http://www.tomshardware.com
http://www.hardocp.com
http://linustechtips.com/main/
http://ark.intel.com
http://www.amd.com/en/products/specifications/processors

A number of subreddits exist where assistance may be found - please make sure
you understand the scope of each and read their rules before posting:
http://www.reddit.com/r/pchelp
http://www.reddit.com/r/buildapchelp
http://www.reddit.com/r/techsupport
http://www.reddit.com/r/overclocking
http://www.reddit.com/r/Intel
http://www.reddit.com/r/AMD


WHAT TO DO IF A PROBLEM IS FOUND?
---------------------------------

The exact cause of a hardware problem can be very hard to find.

If you are not overclocking, the most likely cause is memory. It is not uncommon
for memory to not run correctly at its rated speed (incorrectly "binned"). This is
most easily tested by swapping it with memory from another compatible computer and
retesting. If that is not possible you can try underclocking memory or increasing
memory voltage a tiny bit. Overheating is another possible source of problems.
You can check the temperatures using monitoring software like HWiNFO to make sure
the CPU is below it's rated temperature limit. If not, the cooler may be incorrectly
mounted or disconnected from the system while in transit, or the thermal paste
between the CPU and the cooler may not have been applied properly - YouTube is an
excellent place to find videos demonstrating correct cooler/paste application methods.
Occasionally, the power supply is incapable of supplying sufficient power to the
system under heavy load, you can often diagnose this by monitoring the 12v, 5v and
3.3v voltages - you will typically observe a substantial drop in these voltages when
putting the system under load and generally means the PSU itself needs to be replaced
with a more capable unit.

If you are overclocking, the most likely problems are either the CPU core
voltage being set too low or drooping too far under heavy low. You should
either increase the voltage or adjust the load line calibration to deal
with these issues. Another frequently seen issue is the motherboard failing
to set a suitable voltage for the memory controller when an XMP profile is
enabled.

The above causes are far from a comprehensive list of possible causes. Diagnosing
the exact cause can be a very difficult process.

***NB:*** You should always thoroughly research the voltage tolerances of any
specific component before you start changing it. Memory controllers integrated
into modern CPUs in particular are very sensitive to increased running voltages
and can functionally degrade very quickly if set too high. Also make sure you
have accurate temperature monitoring in place while stress testing with increased
voltages and clock speeds as heat outputs increase exponentially - 1st generation
Ryzen CPUs specifically develop heat-related stability problems when running at
temperatures above 70degC.


CAN I IGNORE THE PROBLEM?
-------------------------

Ignoring the problem is a matter of personal preference. There are
two schools of thought on this subject.

Most programs you run will not stress your computer enough to cause a
wrong result or system crash. If you ignore the problem, then certain
workloads may stress your machine resulting in a system crash. Also,
stay away from distributed computing projects where an incorrect calculation
might cause you to return wrong results. Bad data will not help these
projects! In conclusion, if you are comfortable with a small risk of an
occasional system crash then feel free to live a little dangerously! Keep in
mind that the faster prime95 finds a hardware error the more likely it is that
other programs will experience problems.

The second school of thought is, "Why run a stress test if you are going
to ignore the results?" These people want a guaranteed 100% rock solid
machine. Passing these stability tests gives them the ability to run
CPU intensive programs with confidence.


FREQUENTLY ASKED QUESTIONS
--------------------------

Q) My machine is not overclocked. If I'm getting an error, then there must
be a bug in the program, right?

A) The torture test is comparing your machines results against
KNOWN CORRECT RESULTS. If your machine cannot generate correct
results, you have a hardware problem. HOWEVER, if you are failing
the torture test in the SAME SPOT with the SAME ERROR MESSAGE
every time, then ask for help at http://mersenneforum.org - it is
possible that a recent change to the torture test code may have
introduced a software bug.

Q) How long should I run the torture test?

A) I recommend running it for somewhere between 6 and 24 hours.
The program has been known to fail only after several hours and in
some cases several weeks of operation. In most cases though, it will
fail within a few minutes on a flaky machine. When overclocking it is
entirely feasible to run short 10-15min tests at each increase in
clock speed to quickly assess the feasibility of running at those
speeds, then run longer tests later.

Q) Prime95 reports errors during the torture test, but other stability
tests don't. Do I have a problem?

A) Stability tests are not equal in their ability to detect problems.
Some don't apply a heavy enough load for their results to be reliable,
while others apply loads so heavy that only extremely edge-case
real-world workloads would compare. There also may be significant
differences between stress tests regarding the CPU features they make
use of, so specific tests that don't make use of them may not be valid
for specific use cases where those features are required (e.g. AVX
instructions, virtualization technologies).

Q) A forum member said "Don't bother with prime95, it always pukes on me,
and my system is stable!. What do you make of that?"

or

"We had a server at work that ran for 2 MONTHS straight, without a reboot
I installed Prime95 on it and ran it - a couple minutes later I get an error.
You are going to tell me that the server wasn't stable?"

A) If a system can be easily crashed or made to generate incorrect
results to mathematical functions simply by running a program on it,
it is impossible to argue that it is reliable. The consequences of
that unreliability are up to the user to be aware of, but many faults
go unnoticed for a long time. Glitches in games may be assumed to be
bugs in the programming. Vital data in long term storage may already
have been corrupted without anyone knowing about it. The question to
ask yourself is whether or not the responsibilities of that system are
unimportant enough to make continuing without remedial action worth
the risk.
 
Update. Almost 24 hours later, no BSOD.
I'm not getting my hopes up. My PC has been fine for a few days in a row before, and then it starts again.
Question, have you installed the new RAM?

If yes, have you run P95 again since then?

Errors in P95 often mean unstable CPU or RAM.
 
Not as yet. The new parts are scheduled for delivery today.
Since switching off driver verifier, no BSOD.
The one big change I made was to put two dedicated power cables onto the 3080. If I do get another BSOD, I will be running a fresh Windows install on the new SSD. If still BSOD, then change RAM. If still BSOD, I'll ask around for a known working PSU...
 
Yes, I did. It's not a log, it's a readme type of document.
I read up on that Rounding error, apparently adjusting your CPU voltage up a little bit solves this. The suggestion is going in increments of 0.05V at a time and retesting until the matter is resolved
 
Not as yet. The new parts are scheduled for delivery today.
Since switching off driver verifier, no BSOD.
The one big change I made was to put two dedicated power cables onto the 3080. If I do get another BSOD, I will be running a fresh Windows install on the new SSD. If still BSOD, then change RAM. If still BSOD, I'll ask around for a known working PSU...
I may have missed this but did you reset bios to complete stock? You said you set RAM speed to 2400 but if your CPU/mobo doesn't enjoy the XMP timings then that could 100% cause all the issues you've been seeing, and the P95 errors support that hypothesis.
 
I read up on that Rounding error, apparently adjusting your CPU voltage up a little bit solves this. The suggestion is going in increments of 0.05V at a time and retesting until the matter is resolved
The rounding error could be caused by any CPU or RAM instability, including setting the RAM to run at XMP - I had a similar issue where a Ryzen setup didn't like the XMP profile on my RAM and setting it manually to DRAM Calculator timings solved it (even though the RAM clock was faster than XMP lol).

RAM instability seems a very likely culprit. Step 1 is to reset BIOS to complete stock and rerun P95 to look for errors.

Set P95 to run large FFTs only. That will pick up the issue quicker. Run for at least 8 hours or until you get an error (even one error is enough). You'll usually see the error in the first 15min though.

Increasing the voltage probably refers more to the context where you're overclocking the CPU and using P95 as a stability test.
 
Last edited:
The rounding error could be caused by any CPU or RAM instability, including setting the RAM to run at XMP - I had a similar issue where a Ryzen setup didn't like the XMP profile on my RAM and setting it manually to DRAM Calculator timings solved it (even though the RAM clock was faster than XMP lol).

RAM instability seems a very likely culprit. Step 1 is to reset BIOS to complete stock and rerun P95 to look for errors.

Set P95 to run large FFTs only. That will pick up the issue quicker. Run for at least 8 hours or until you get an error (even one error is enough). You'll usually see the error in the first 15min though.

Increasing the voltage probably refers more to the context where you're overcloking the CPU and using P95 as a stability test.


Agreed.
I'm 99% certain this is a ram issue.

We'll see later today when you get your new modules.
 
Agreed.
I'm 99% certain this is a ram issue.

We'll see later today when you get your new modules.
Irritatingly it might not even mean that the RAM is faulty. Just the mem controller being weirdly picky.

I had a set that would give errors at stock settings (2133) but when overclocked to 3600C16 it was rock solid.
 
Last edited:
Not as yet. The new parts are scheduled for delivery today.
Since switching off driver verifier, no BSOD.
The one big change I made was to put two dedicated power cables onto the 3080. If I do get another BSOD, I will be running a fresh Windows install on the new SSD. If still BSOD, then change RAM. If still BSOD, I'll ask around for a known working PSU...
When verifyer was on and giving you BSOD's that means that there is still instability, which will carry through to when verifyer is off, but the BSOD's will be far less frequent.

Were you getting BSOD's with verifier WITH the 2 dedicated power cables plugged in?
 
Irritatingly it might not even mean that the RAM is faulty. Just the mem controller being weirdly picky.

I had a set that would give errors at stock settings (2133) but when overclocked to 3600C16 it was rock solid.

Yeah exactly

If you check my posts in this thread, had huge memory issues with a mates pc and memory.
 
Not much was changed in BIOS. I'm not overclocking, XMP isn't switched on, and initially I had the clock for memory set to 3600, the same as the RAM is specced. Earlier this week I took it down to 2400. I still got a BSOD after that, but driver verifier was still running at that time.

The current RAM is not on the QVL list for my mobo (Asrock B550 Steel Legend), but the new RAM is.

I'm definitely moving to new SSD though... Enough of you are calling for it :D

I already spent the money buying parts, so in essence I did throw money at the problem... but to soothe my mind I'm waiting for the problem to reappear before adding the new parts.
 
When verifyer was on and giving you BSOD's that means that there is still instability, which will carry through to when verifyer is off, but the BSOD's will be far less frequent.

Were you getting BSOD's with verifier WITH the 2 dedicated power cables plugged in?
Eish... If I knew there would be an exam I woulda made notes...

Going through the chat, and when I made posts, and comparing the dump files and when they were created, I believe I had no BSOD since switching off driver verifier.

I agree the issue could still persist... this is why I will wait before installing new RAM. New SSD I will do anyway.
 
Not much was changed in BIOS. I'm not overclocking, XMP isn't switched on, and initially I had the clock for memory set to 3600, the same as the RAM is specced. Earlier this week I took it down to 2400. I still got a BSOD after that, but driver verifier was still running at that time.

The current RAM is not on the QVL list for my mobo (Asrock B550 Steel Legend), but the new RAM is.

I'm definitely moving to new SSD though... Enough of you are calling for it :D

I already spent the money buying parts, so in essence I did throw money at the problem... but to soothe my mind I'm waiting for the problem to reappear before adding the new parts.
Yeah there's your issue. You can't just increase the memory clock without enabling XMP 😬 the rated RAM speed depends on changing the timings too, which enabling XMP does. Manually adjusting the RAM speed, even just to 2400, is very likely to cause instability. In fact I'm surprised it even POSTed at all like that.

So yeah, in short, you can't just change the RAM speed without tuning the timings. Definitely rather enable XMP.
 

Users who are viewing this thread

Latest posts

Back
Top Bottom