After automatic organization is done, what do I do with my originals?

I’ve had a couple users ask this over email now:

I’m enjoying testing out your application. I have a question which I can’t find a straight answer to in the documentation. I’ve used the “Automatic library organization” feature to copy my photos (/drive/Photos) into the Library (/drive/PhotoStructure). I’d rather not have two copies of everything, so:

  1. Is it advisable to have the only copy of a photo in a photostructure library

The best way to not lose files is to have several copies, preferably on different devices on different computers.

Several beta users have asked for PhotoStructure’s automatic organization feature to move (rather than copy) their files into their library: Ability to delete originals after importing with "automatic organization" (please vote on this if you’d like me to prioritize it!)

  1. Can I delete my originals?

PhotoStructure’s file copy process is pretty robust: after it copies the file, it re-reads the copied file and verifies the file contents of the source and destination are the same. That said, I’d only remove your originals from their source directory if and only if there was another full backup of that directory already.

I’d hate to have you lose any of your files! Many years ago I had a couple of family members lose large swaths of their photo libraries because they “clicked the wrong button” on Picasa and apple photos. I’m trying to make PhotoStructure not have any of these “foot guns.”

I think any “delete” inside the app itself should be buried behind “are you sure? we warned you…” but that aside, I think if you can assert a high confidence signal “here is where it is inside ME” then I’d like to have this.

I want to confirm that for assets which are de-duped, they ARE imported as-is, but are just not displayed as the primary selection. And, for identical assets which are distinct files on the original sources but detected as being in more than one “Album” -can you tell us if you hard link these? If its done through the DB, it could actually be much the same, its just if I wanted to ‘export’ the state of what came in, I want some confidence it can re-create what I delete.

Yes: everything copied into the library, including any sidecars it can find, are copied into your library, verbatim. PhotoStructure even tries to keep the prior stat metadata (like the ctime, atime, and mtime), with differing success on different platforms.

That said, any inferred metadata (collected by sibling files in the source) may not be able to be re-inferred once the file lives in the library, so in v2.1, a sidecar with this inferred information will be added next to the original file.

I actually copy the file: PhotoStructure doesn’t use soft or hard links (yet).

I guess I could try to hard-link files when PhotoStructure copies files into your library, and if ln --physical $src $dest fails (because it crosses filesystems), do a copy? That behavior would have to be opt-in, though: I suspect it may be a confusing default.

Yes: everything copied into the library, including any sidecars it can find, are copied into your library, verbatim . PhotoStructure even tries to keep the prior stat metadata (like the ctime , atime , and mtime ), with differing success on different platforms.

Thats great. Thanks! So, if I have data in some 3-2-1 backup regime after I have imported, I can delete the local temp copy used for import (to get round the filesystem/network) issues and rely on the PhotoStructure/ dir to have everything, even if I have to do some sqlite3 to find it. Perfect!

Probably? Once I understand how google albums work in this model, I’ll be using this PhotoStructure as the canonical form.

I guess I could try to hard-link files when PhotoStructure copies files into your library, and if ln --physical $src $dest fails (because it crosses filesystems), do a copy? That behavior would have to be opt-in, though: I suspect it may be a confusing default.

Ah no this isn’t quite what I meant. I mean sure, for copies you know the src is same FS a man 2 link() call works, but what I mean is that irrespective of source a {appears in album b, other copy c} once that source a has been copied in, When you detect album b/a “is identical” and other copy c “is identical” then their representation inside PhotoStructure as distinct named objects could be a hard link.

“is identical” being “its a 1-to-1 bit copy, jim” not the heuristic “same/duplicate” -At source they may well have been discretely different files, they’re just true copies.

I had assumed this is true for google albums. I just checked, and horror of horrors it’s not: I just found out that the shasum hash for images in a google album, and the pre-album version aren’t the same. My primary drive for this has just shrunk massively. :frowning:

[edit: sometimes -I can also find instances where they are identical. So… ]

Ah: yeah, if the same SHA is already known to exist in the library, PhotoStructure won’t copy those bytes into your library again.

(hopefully, that’s what you meant?)

I’ve found that Google Photos will change almost every tag associated with a photo, including GPS location, exposure information, and even captured-at tags. I have a lot of heuristics in PhotoStructure to deal with these shenanigans.

What was that? Trying to dedupe your google photos account?

Ah: yeah, if the same SHA is already known to exist in the library, PhotoStructure won’t copy those bytes into your library again. (hopefully, that’s what you meant?)

Yes. If this is represented as an alternate dir/name path to the “asset” in SQL, I’m good. I want to be able to re-create all external paths which this file instantiated, ideally by … hard links.

What was that? Trying to dedupe your google photos account?

Trying to de-dupe my own, my partners, google and locally maintained stashes of Photos into a single canonical tree of photos. Across the years we’ve had multiple competing approaches and “use Picasa on my laptop, always” breaks down in the face of reality.

I think I found 7 instances of more than 1 photo across the mix. Four I could understand (my mac, her PC w picasa, her google photos backup/album from Picasa, my google photos) but this is getting out of hand.

Seven, huh?

Seriously, though, I have assets with over 20 duplicates scattered over that many hard drives: and at least a handful of unique SHAs. …I had to make the info panel scrollable, and run parallel stats to render in a reasonable amount of time…

See https://photostructure.com/about/introducing-photostructure/#my-digital-disarray

So you’re in good company here.

I definitely would not delete the originals, especially since PS is still very much WIP and I basically re-organize/rebuild after each release to see what nifty new thing it’s doing when organizing my pics :slight_smile:

2 Likes

Strong agree. Although every release (even alpha and betas) pass some 10k unit and integration tests:

  1. disk space is cheap
  2. there will always be new edge and corner cases that I haven’t seen yet.

Another backup is never a bad thing… Having seen someone loose their pictures due to hardware failure (not related to PhotoStructure!!!) - it’s terrible.