What's new
Carbonite

South Africa's Top Online Tech Classifieds!
Register a free account today to become a member! (No Under 18's)
Home of C.U.D.

GPU overclock suddently unstable after many years?

AceSwiftShooter

Dock it like its Thot
VIP Supporter
Rating - 100%
129   0   0
Joined
Oct 15, 2012
Messages
4,380
Reaction score
1,166
Points
7,505
Location
Cape Town
will keep this short.

Decided to play Rust, and would get kicked with error "invalid session" something or other. couldnt figure it out.
Then tried to play Valheim again after not playing for quite a while, and game crashes. event viewer shows nvidia driver that crashed and recovered.

I had recently upgraded from a 9700k to a 9900ks, so I started thiunking this was the problem.
I also managed to OC my ram even further with the 9900ks, so this was also something that could be the problem.

I decided to remove all RAM and CPU overclocks, uninstall every program I didnt need, verified game files, basically every fix the interwebs suggested.
I ignored all advise to remove GPU OC as I believed this could not be the problem since it has been stable for many years.

Long story short, set GPU to stock and both games playable.

my overclock: +400 on mem, and an undervolt on core. power 120%. temps in mid 60's.

so can my GPU have deteriorated slightly, even with an undervolt and decent temps?

Or, is my GPU suddenly trying to work harder due to more powerful CPU?
 
Run the custom curve (undervolt) without touching the memory?
 
Could also just be drivers.
Just beause OC worked on a set of drivers doesn't mean it will work on all.
When is the last time you updated your drivers?
As @heinreich said it could be your PSU and as @Switch said, everything deteriorates over time.
 
My suggestion would be this.
Remove the OC and see if all is fine.
If so, start with the core and see if that is fine.
Then move onto mem.
 
Could also just be drivers.
Just beause OC worked on a set of drivers doesn't mean it will work on all.
When is the last time you updated your drivers?
As @heinreich said it could be your PSU and as @Switch said, everything deteriorates over time.
I update drivers as soon as they're available. I did try go back a few versions and still crashed.
 
My suggestion would be this.
Remove the OC and see if all is fine.
If so, start with the core and see if that is fine.
Then move onto mem.
yep it is fine with OC removed. My guess it my undervolt was always borderline, and now the GPU just needs a bit more juice.
 
yep it is fine with OC removed. My guess it my undervolt was always borderline, and now the GPU just needs a bit more juice.
Look I am no expert but I'd like to state something that many people sometimes miss, you stated you went from a 9700K to 9900KS. Now taken that the 9900KS stock standard runs faster than the 9700K if I'm not mistaken both in base frequancy and max boost frequency. Now one thing I've learned over the years is that a faster processor will push the GPU further than say I lower end CPU. I've seen this clearly when looking at the GPU score in some benchmarks when comparing the same run with same gpu but different cpu. So it may be that the oc on your gpu was never stable and it took the upgrade to the 9900KS to show that to you.

If I may ask, you stated in your opening post "my overclock: +400 on mem, and an undervolt on core. power 120%. temps in mid 60's." so the overclock was only on the memory or did you also bump up the GPU core clock. I take it your gpu is the one in your sig, EVGA 1080Ti SC2, if so then the +400mhz memory I don't think should be an issue, unless you got dud memory chips on that card. I've had a number of GPU's over the years and I have a fair understanding what the limits are for the GPU's and ours (South Africa) is way worse than what the people in the USA get.

I do not believe that it is the PSU, but then again I'm not sure what PSU you are using. Usually PSU issues would show itself quickly and not just in games.
 
Thanks for all the ideas.
I have started fiddling a bit and ya. It's weird.
@AceSwiftShooter if I may ask, sorry I did not read all your replies, but did you state that with the card running stock, without any overclock everythin works fine? If so, then go back to the overclock and only bump the previous +400mhz now 0mhz by 50mhz and test test it and see if it works now if it works bump it by another 50mhz to +100mhz and test until you find where it would fail. I'm saying +50mhz as with memory there is usually alot more headroom than with the core clocks.
 
@AceSwiftShooter if I may ask, sorry I did not read all your replies, but did you state that with the card running stock, without any overclock everythin works fine? If so, then go back to the overclock and only bump the previous +400mhz now 0mhz by 50mhz and test test it and see if it works now if it works bump it by another 50mhz to +100mhz and test until you find where it would fail. I'm saying +50mhz as with memory there is usually alot more headroom than with the core clocks.
Yes ya runs fine stock. So it seems to also run fine with just +400 on mem. What it doesn't like, is moving the power slider over 100%....
 
Yes ya runs fine stock. So it seems to also run fine with just +400 on mem. What it doesn't like, is moving the power slider over 100%....
I think it may be that the GPU is pushing the core clocks higher with the added power envelope which might cause issues, depending on what the GPU core is capable of. But it is difficult to say whether that is the case, it is just strange that it only causes issues when raising the power limit. Maybe it does point to the PSU but from where I'm sitting I cannot say for sure.
 
What's your max voltage and frequency on the curve set to?
 
1987Mhz @ 1050mV
Many 10 series chips seem to have run stable at ~1900-2050mhz.

To get it stable again, either take the frequency down perhaps to 1950 or up the voltage if your model allows it.

You shouldn't see any real performance difference by going down 50-100mhz.

Also for interest sake when last did you change thermal paste, pads and cleaned the card?
 
Many 10 series chips seem to have run stable at ~1900-2050mhz.

To get it stable again, either take the frequency down perhaps to 1950 or up the voltage if your model allows it.

You shouldn't see any real performance difference by going down 50-100mhz.

Also for interest sake when last did you change thermal paste, pads and cleaned the card?
Thanks ya I've yet to put time into trying to undervolt it again.

I repasted and padded with thermal Grizzly in November 2020.
Card is still quite clean
 

Try these settings for interest sake, I know it says rtx 3080/3090 but pascal cards uv at very similar volts.

Perhaps just add 10-20mv max at each step for stability.
 
did you replace the heat sink paste?

What is core temp rise in games?
 
still havent bothered to look into this much. But discovered something a few minutes ago. Someone, :unsure:, has connected my PC through an old UPS (from my work, weird) and this UPS outputs 480W max.
Now im not Elon Musk, but I thnk my PC may be triny got pull more than that hunk o'junk can give.

Epic fail from whoever did this!! outrage.
 
So i ditched the psu, which fixed the issue of my PC switching off when drawing too much power.

However the issue of nvidia drivers crashing when increasing power limit, is still persistant.
I tried increasing voltage, thinking its being starved, but nada.

I was monitoring with HWINFO, and peak power draw reached 306watts. This is a 8+6 pin GPU.
Theoretical limit therefore is 300watts.
So this may be why it's crashing?
 

Users who are viewing this thread

Back
Top Bottom