Physically de-dupe the library

I am finding that there are duplicates copies in the library itself. Now, obviously photostructure recognizes that it’s the same picture, but it still copied all of them in the library.

I understand after reading the FAQ that this likely because there is some minute metadata difference between the files (added by google or other software) so photostructure errs in the side of caution and copies them all into the library.

So that’s my feature request: I would like the ability to generate a physical copy of the library where only the “best” version of a picture is saved. Just like photostructure decides which “best” picture to display, I would hope that at some point we’d have the ability to have that same determination applied to the “plus” library.

My library looks like yours: tons of dupes.

I’ve been hesitant to delete files from the library that aren’t considered the “primary,” given

In thinking about this more, though, I think I could do the following and still be “safe”:

  1. Only copy new asset variations into the library if they are the new “best” variation
  2. Only remove prior-copied variations from the library if there’s an existing copy of the file that PhotoStructure has found on a different volume, the volume is mounted, and the file’s prior SHA matches the current SHA (to ensure that there’s no data loss by deleting the copy)

What do you think?

(Edit: in re-reading your post, I may have misunderstood: are you wanting PhotoStructure to delete duplicate files that are outside of your library?)

You initial reading was correct. I don’t want photostructure to delete anything outside of the “plus” library. In fact, I mount the scanned folders read-only just to be extra sure.

I’d like the “plus” library to have only the “best” version of every asset. Basically a “physical” de-duping and not just a logical de-duping. Of course, all of the duplicates in the source paths would still be there should I disagree with a decision that photostructure made, so there really is no data loss should photostructure do something stupid.

So your ideas sound ok to me.

An additional idea that just came to mind just now - maybe worth exploring: could one specify (either through UI or configurations) a source folder that should always take precedence? Or even a ranking/weight for each folder? Thinking about my situation: there is one source path that I actively manage (edit metadata, post new pictures) while the other paths are more historical. You can see in my screenshot 3 paths: “ApplePhotos”, “GoogleTakeout” and “oldNas”. Really the ApplePhotos is the one copy that should always win in my book, with “oldNas” coming second and “GoogleTakeout” last. So the paths could be given extra consideration in the heuristic.

I could add a “volume precedence” setting which would just be a list of volumes, but specifying volumes by mountpoints is problematic. I could accept volume labels, volshas, and mountpoints, I guess?

Let’s add this suggestion as a new feature:

If/when this is implemented, it would be slick to have a “Help me deduplicate” experience that displays the “best” asset with a variant and asks user to confirm deleting the worse one (or delete the “best” one and replace it). The majority of related thread Deleting / hiding photos - #3 by mrm seems to be addressing deleting entire assets, not variants of assets, but @codepoet did mention

I’d like to be able to verify the duplicates in a side-by-side and then delete the “other” one

I think you could use the same “Delete” action you already made for managing entire assets, but I do not think the “Archive” or “Remove” actions would apply.

And of course the variants would not actually be deleted from disk until you click the “Empty trash” button from the “View trash” search.

Lastly I think the “View trash” search should have some indicator differentiating between variants and whole assets. And the variants should include a button to compare again with the “best” variant.

Thanks for sharing those thoughts!

The v2.1 implementation of hide/remove/delete is actually only at the asset level–searches don’t know about asset files (yet–it’s why you can’t search for filenames–yet).

I could add a “delete now” option to each file in the asset info panel, that didn’t have the option of undo, but even that seems like it’d be dangerous. I’ll think about how this could work.