What's new
Carbonite

Welcome to Carbonite! Register a free account today to become a member! Once signed in, you'll be able to participate on this site by adding your own topics and posts, as well as connect with other members through your own private inbox!

Massive Data Duplicate Situation

pseudomeaningful

Junior Member
Carbonite Donator
Rating - 100%
3   0   0
Joined
Aug 28, 2019
Messages
14
Reaction score
5
Points
195
Hello Fellow carbies

I'm hoping your combined knowledge will be able to help me with the problem I am facing, well I usually turn my back because it makes me cringe to see External HDD's in a box, on top of a computer tower.

Unfortunately I am the son of a massive data hoarder, and not the cool r/datahoarders kind. My Mother has approximately 16TB of data spread across 8-10 External hard drives. To make matters worst they are literally all 2.5" seagate drives 😖, one of which has already failed because... seagate.

The drives mostly contain backups of her Pictures as well as backups of her two work desktops. The issue is there is an astonishing amount of duplicate files. Of the Approx 16TB I estimate there is at maximum 2-3TB of Actual data. I plugged as many as i could into my computer at once and used CCleaners simple Duplicate file tool to scan them, and the resulting .txt file was 130MB...I have entire Microbiology Textbook pdfs which are smaller. There is literally 5 copies of the same image on some of the drives.

I looked into all the types of de-duplication software available and the mot comprehensive I could find was Diskover, which requires a working knowledge of Python, which I simply don't have time to learn right now with university.

Thus I decided the simplest solution would be to build an unraid server (also always wanted a home server) That she can backup her pictures and local machine to, probably going to make the company switch to macrium reflect instead of the current cobian they are using, to provide some redundancy and safety for storage. I will implement offsite storage through AWS, Backblaze, or similar to provide offsite storage once at a later stage. I have acquired most of the components necessary for the server already.

FINALLY, The actual question. I rather quickly came to the realization that de-duping this much data is way, way beyond my abilities. Thus, could anyone recommend an individual or company in the Durban area that deals with data management/ would be able to create two master copies of the de-duped files. Obviously there will be a cost involved but to continue buying hard drives is ridiculous.

I figured this would be the best place to start and if necessary the thread can be moved to a more appropriate section by the mods.

TL;DR - Much data, so duplicated. Let me know of anyone who can de-dupe a lot of hardrives
 

JollyJamma

Official Forum Dunce
VIP Supporter
Rating - 100%
43   0   0
Joined
Aug 22, 2011
Messages
5,085
Reaction score
1,439
Points
3,255
Location
Calne, United Kingdom
I have a weird way of going through duplicate data:

Search *.* for all files and then sort by file size or type.

Really manual and not great for a ton of data but it does work.

I had some software that could find duplicate data for you but it wasn’t good unless you paid for it.
 

souljazk

Senior Member
VIP Supporter
Rating - 100%
99   0   0
Joined
Jul 4, 2014
Messages
3,517
Reaction score
1,011
Points
2,765
I struggle with this with my mother too .lol. And clients... Check FreeNAS it has a data deduplication thingy, not sure if it will do what you want.

Look at a app called FolderSize. It might help see where large files/folders are. Before you start on this, copy NB data to another drive and one cloud service. Verify copy and cloud backup before moving forward.
 

Nemo415

Allegedly a well-known member
VIP Supporter
Reseller
Rating - 100%
110   0   0
Joined
Oct 21, 2014
Messages
1,633
Reaction score
502
Points
2,565
Location
Pretoria
For the Photos...

If you aren't pedantic about Google and Cloud storage, use Google Photos. They will store for free and unlimited photos. They will however compress the photos, so if these pics were taken with a DSLR camera and 1 image in 100 MB then this won't work.
 

Latest posts

Top Donors

$301.00
$200.00
$155.00
$113.00
Top