pseudomeaningful
Senior Member
- Joined
- Aug 28, 2019
- Messages
- 335
- Reaction score
- 89
- Points
- 2,835
Hello Fellow carbies
I'm hoping your combined knowledge will be able to help me with the problem I am facing, well I usually turn my back because it makes me cringe to see External HDD's in a box, on top of a computer tower.
Unfortunately I am the son of a massive data hoarder, and not the cool r/datahoarders kind. My Mother has approximately 16TB of data spread across 8-10 External hard drives. To make matters worst they are literally all 2.5" seagate drives 😖, one of which has already failed because... seagate.
The drives mostly contain backups of her Pictures as well as backups of her two work desktops. The issue is there is an astonishing amount of duplicate files. Of the Approx 16TB I estimate there is at maximum 2-3TB of Actual data. I plugged as many as i could into my computer at once and used CCleaners simple Duplicate file tool to scan them, and the resulting .txt file was 130MB...I have entire Microbiology Textbook pdfs which are smaller. There is literally 5 copies of the same image on some of the drives.
I looked into all the types of de-duplication software available and the mot comprehensive I could find was Diskover, which requires a working knowledge of Python, which I simply don't have time to learn right now with university.
Thus I decided the simplest solution would be to build an unraid server (also always wanted a home server) That she can backup her pictures and local machine to, probably going to make the company switch to macrium reflect instead of the current cobian they are using, to provide some redundancy and safety for storage. I will implement offsite storage through AWS, Backblaze, or similar to provide offsite storage once at a later stage. I have acquired most of the components necessary for the server already.
FINALLY, The actual question. I rather quickly came to the realization that de-duping this much data is way, way beyond my abilities. Thus, could anyone recommend an individual or company in the Durban area that deals with data management/ would be able to create two master copies of the de-duped files. Obviously there will be a cost involved but to continue buying hard drives is ridiculous.
I figured this would be the best place to start and if necessary the thread can be moved to a more appropriate section by the mods.
TL;DR - Much data, so duplicated. Let me know of anyone who can de-dupe a lot of hardrives
I'm hoping your combined knowledge will be able to help me with the problem I am facing, well I usually turn my back because it makes me cringe to see External HDD's in a box, on top of a computer tower.
Unfortunately I am the son of a massive data hoarder, and not the cool r/datahoarders kind. My Mother has approximately 16TB of data spread across 8-10 External hard drives. To make matters worst they are literally all 2.5" seagate drives 😖, one of which has already failed because... seagate.
The drives mostly contain backups of her Pictures as well as backups of her two work desktops. The issue is there is an astonishing amount of duplicate files. Of the Approx 16TB I estimate there is at maximum 2-3TB of Actual data. I plugged as many as i could into my computer at once and used CCleaners simple Duplicate file tool to scan them, and the resulting .txt file was 130MB...I have entire Microbiology Textbook pdfs which are smaller. There is literally 5 copies of the same image on some of the drives.
I looked into all the types of de-duplication software available and the mot comprehensive I could find was Diskover, which requires a working knowledge of Python, which I simply don't have time to learn right now with university.
Thus I decided the simplest solution would be to build an unraid server (also always wanted a home server) That she can backup her pictures and local machine to, probably going to make the company switch to macrium reflect instead of the current cobian they are using, to provide some redundancy and safety for storage. I will implement offsite storage through AWS, Backblaze, or similar to provide offsite storage once at a later stage. I have acquired most of the components necessary for the server already.
FINALLY, The actual question. I rather quickly came to the realization that de-duping this much data is way, way beyond my abilities. Thus, could anyone recommend an individual or company in the Durban area that deals with data management/ would be able to create two master copies of the de-duped files. Obviously there will be a cost involved but to continue buying hard drives is ridiculous.
I figured this would be the best place to start and if necessary the thread can be moved to a more appropriate section by the mods.
TL;DR - Much data, so duplicated. Let me know of anyone who can de-dupe a lot of hardrives