Physically de-dupe the library

I am finding that there are duplicates copies in the library itself. Now, obviously photostructure recognizes that it’s the same picture, but it still copied all of them in the library.

I understand after reading the FAQ that this likely because there is some minute metadata difference between the files (added by google or other software) so photostructure errs in the side of caution and copies them all into the library.

So that’s my feature request: I would like the ability to generate a physical copy of the library where only the “best” version of a picture is saved. Just like photostructure decides which “best” picture to display, I would hope that at some point we’d have the ability to have that same determination applied to the “plus” library.

My library looks like yours: tons of dupes.

I’ve been hesitant to delete files from the library that aren’t considered the “primary,” given

In thinking about this more, though, I think I could do the following and still be “safe”:

  1. Only copy new asset variations into the library if they are the new “best” variation
  2. Only remove prior-copied variations from the library if there’s an existing copy of the file that PhotoStructure has found on a different volume, the volume is mounted, and the file’s prior SHA matches the current SHA (to ensure that there’s no data loss by deleting the copy)

What do you think?

(Edit: in re-reading your post, I may have misunderstood: are you wanting PhotoStructure to delete duplicate files that are outside of your library?)

You initial reading was correct. I don’t want photostructure to delete anything outside of the “plus” library. In fact, I mount the scanned folders read-only just to be extra sure.

I’d like the “plus” library to have only the “best” version of every asset. Basically a “physical” de-duping and not just a logical de-duping. Of course, all of the duplicates in the source paths would still be there should I disagree with a decision that photostructure made, so there really is no data loss should photostructure do something stupid.

So your ideas sound ok to me.

An additional idea that just came to mind just now - maybe worth exploring: could one specify (either through UI or configurations) a source folder that should always take precedence? Or even a ranking/weight for each folder? Thinking about my situation: there is one source path that I actively manage (edit metadata, post new pictures) while the other paths are more historical. You can see in my screenshot 3 paths: “ApplePhotos”, “GoogleTakeout” and “oldNas”. Really the ApplePhotos is the one copy that should always win in my book, with “oldNas” coming second and “GoogleTakeout” last. So the paths could be given extra consideration in the heuristic.

I could add a “volume precedence” setting which would just be a list of volumes, but specifying volumes by mountpoints is problematic. I could accept volume labels, volshas, and mountpoints, I guess?

Let’s add this suggestion as a new feature: