What's new
Carbonite

South Africa's Top Online Tech Classifieds!
Register a free account today to become a member! (No Under 18's)
Home of C.U.D.

Duplicate Photo Finder

iamgigglz

VIP
VIP Supporter
Rating - 100%
311   0   0
Joined
Aug 19, 2014
Messages
8,684
Reaction score
2,335
Points
10,155
Location
Parkhurst
Wasn't sure which board to post this in.

I have two folders with 20,000+ photos in each. I need to eastablish which photos in folder A are missing from folder B.
The kicker is that the folder structure is not the same and I can't rely on file names.

There are a million duplicate photo finders out there, all claiming to be free, all with heavy limitations.

Can the Carbs recommend a good one? Free would be great but I don't mind paying for good software.
 
This interests me in general, my backup drives end up with duplicate files so the drives get to capacity because of shitloads of duplicate files. :(
 
Questions:
  • Are the photos in Folder B the same in terms of pixels or have they been altered?
  • Do you just require a list of files that are not present in Folder B? Or do you want the files to automagically be copied to a desired location for further perusal?
  • How big are these folders in GB? Is it transferable in order to test with?
I think this covers some of what I would need to be able to write a quick thing. Should be a relatively simple mission to write something in C# that will compare your selected folders and tell you which files are not in the other folder. A simple hasher to build an array of hashes along with image location, then you just compare the hashes from folder A to B. WHERE NOT IN kind of thing.

If the images have been altered in any way it would be a bit trickier. Not sure how I would approach it in that case.
 
I haven't used any of these, but:
looks interesting.

There is a technical article

@eyesuc 's questions refer, but there will 100% be something that you can already use. Although scanning will take a while. good thing computers don't get bored doing the same task over and over and over and over....
 
Questions:
  • Are the photos in Folder B the same in terms of pixels or have they been altered?
  • Do you just require a list of files that are not present in Folder B? Or do you want the files to automagically be copied to a desired location for further perusal?
  • How big are these folders in GB? Is it transferable in order to test with?
I think this covers some of what I would need to be able to write a quick thing. Should be a relatively simple mission to write something in C# that will compare your selected folders and tell you which files are not in the other folder. A simple hasher to build an array of hashes along with image location, then you just compare the hashes from folder A to B. WHERE NOT IN kind of thing.

If the images have been altered in any way it would be a bit trickier. Not sure how I would approach it in that case.
  • I'd only want to search for true duplicates. Edited photos should not be considered duplicates of the originals. So a byte-level file content search would work, rather than pics that "look" the same.

  • Ideally I'd either like the duplicates or the non-duplicates to be moved to some other folder. I'll sort them into the rigth folders manually from there. Deleting the duplicates is also an option.

  • Ok so my numbers were for the sake of the question. In actuality Folder A is about 33GB (10k files). Folder B is 340GB (113k files)
 
I haven't used any of these, but:
looks interesting.

There is a technical article

@eyesuc 's questions refer, but there will 100% be something that you can already use. Although scanning will take a while. good thing computers don't get bored doing the same task over and over and over and over....
Interesting - I'll check it out. Thanks.
 
Should be a relatively simple mission to write something in C# that will compare your selected folders and tell you which files are not in the other folder. A simple hasher to build an array of hashes along with image location, then you just compare the hashes from folder A to B. WHERE NOT IN kind of thing.
I'm a C# dev too :) The thought crossed my mind (and tbh it would be rewarding af once working) but it all sounded way too much like more work.
 
I'm a C# dev too :) The thought crossed my mind (and tbh it would be rewarding af once working) but it all sounded way too much like more work.
Could also be entertaining to grab one of the python libraries and pick up a new skill.

Obviously, for varying definitions of 'entertaining' :D
 
I'm a C# dev too :) The thought crossed my mind (and tbh it would be rewarding af once working) but it all sounded way too much like more work.
Nice! So let rip! This should be a simple one. 30 mins to 1 hour to have something totally usable.
 
  • I'd only want to search for true duplicates. Edited photos should not be considered duplicates of the originals. So a byte-level file content search would work, rather than pics that "look" the same.

  • Ideally I'd either like the duplicates or the non-duplicates to be moved to some other folder. I'll sort them into the rigth folders manually from there. Deleting the duplicates is also an option.

  • Ok so my numbers were for the sake of the question. In actuality Folder A is about 33GB (10k files). Folder B is 340GB (113k files)
Perfect, then a straight forward byte array hash should work.

Geeeezus that's a LOT of photos.
 
Geeeezus that's a LOT of photos.
30+ years of photos from my wife and I, plus I'm a backup service for my parents and my in-laws.
Also trust me when I say your photo collection starts getting insane when you have a kid...or two in my case...
 
I did not know winmerge could do folder scans!

Also check out the portalable app version - nothing to (un)install

portableapps.com/
 

Users who are viewing this thread

Latest posts

Back
Top Bottom