Library as Content Addressable Storage

I was poking around to see how PS organized it’s library, and I’m surprised to see it’s a typical date tree.

Was any consideration given to CAS? For instance, fingerprinting the content and storing it by fingerprint/hash?

Organizing assets into SHA-based folders was actually something I originally considered: I actually use this approach in PhotoStructure’s image cache.

There are a couple reasons why I didn’t use it for the asset library, though:

  1. SHA-based directories wouldn’t place asset variants nearby

  2. What happens when assets are edited? Would they need to be moved into a different hierarchy after every edit?

  3. Several PhotoStructure versions back, assets had a single “asset fingerprint,” which would be stable across edits: this would fix the first and second issues above, but in experience, Google Photo’s penchance for wiping metadata and mucking with image contents meant a single fingerprint wasn’t a robust de-duplication strategy, which led to the current multi-fingerprint de-duping approach.

  4. It seems that most people already organize their photos in something like yyyy/yyyy-mm-dd-event, so I adopted that in following the principle of least astonishment.

I may be misunderstanding your question, though?

Also: the date tree is customizable! See the assetSubdirectoryDatestampFormat setting:

# +------------------------------------+
# |  assetSubdirectoryDatestampFormat  |
# +------------------------------------+
#
# If you chose to copy assets into your library, they will be copied into
# <originals directory>/<result of this pattern>/<original imagename>.
#
# - See the originalsDir system setting for what your <originals directory> is
# (it defaults to your library root directory).
#
# - Please encode this path with forward-slashes, even if you're on Windows.
#
# - If you want to add a static path, escape the pathname with single quotes
# (like "'photos'/y/MM/dd").
#
# - This will always be interpreted as a relative path from your
# PhotoStructure library.
#
# - See
# <https://moment.github.io/luxon/docs/class/src/datetime.js~DateTime.html#instance-method-toFormat>
# and
# <https://moment.github.io/luxon/docs/manual/formatting.html#table-of-tokens>.
# (env: "PS_ASSET_SUBDIRECTORY_DATESTAMP_FORMAT", env aliases:
# ["PS_ASSET_SUBDIR_FORMAT"])
#
# assetSubdirectoryDatestampFormat = "y/y-MM-dd"

I don’t see that as a problem. At some point PS should be the primary interface to the archive. If you want to be fancy you could provide exports, symlink trees, or FUSE mounts into the archive.

There are different ways to make the hash address. Phash, SHA on just the image (ie: metadata removed), etc. It’d also make sense to store that original hash into the image as a unique ID. If the image metadata updates, the hash should match if you always exclude metadata from the hash. If the image changes but it has a unique ID tag, no need to recreate. Essentially hash once on import, and later files that are compared for deduplication should match.

I can’t say for other tools who change metadata. I also know in your docs you write that you don’t want to assume that PS is the authoritative source for the images. I appreciate that. Writing back tags to the images and ensuring that the images are the record, thus your database is just a cache of those records is a great start. That’s more about avoiding vendor lock in than sharing.

Remember most of us skip from product to product trying to get a handle on this, which causes more duplication. At the moment PS has a great interface and you’re improving imports and the backend. Your stated goals are to make a permanent home for photos. You can’t do that without trying to assume authority over the photos in some way.

Do you really expect users will frequently manage photos in other products once they have them in PS?

Yes, that’s common. I’d expect an export like that at minimum.

This reminded me that this suggestion to add a webdav export may address what you’re asking for, if you added, say, SHA as an exported path as well as tag paths.

I guess it matters what you mean by “manage.”

If you mean “organize”, then, at some point, I’d be delighted if PhotoStructure got this job done for most people. I expect more advanced users will already have a “workflow” that works for them that uses other software that they don’t want to give up.

If you mean “edit,” though: PhotoStructure won’t ever have the editing features of, say, Photoshop or Pixelmator. I do want to support quick edits, ala Google Photos editing sliders, but I expect people will always want to be able to edit photos in their favorite app.

Webdav. I’m so very very sorry. (Edit: Added very, out of sympathy)

Organizing is certainly the key. Non-trivial editing is separate. For that, I’d suggest a temporary export, or give the user the real filename. Otherwise if I were an artist or something, I’d only be indexing photos in PS after they are complete.

1 Like

LOL. It’s a very chatty protocol, but it’s very widely supported.

If there’s something else you’d recommend, feel free to comment in that topic!