Store inferred/extracted metadata with assets

TL;DR:

PhotoStructure applies a ton of heuristics when it extracts metadata from assets.

This work should be stored so that if you stop using PhotoStructure, this metadata is still available.

Metadata? Heuristics?

PhotoStructure has very robust heuristics to pull in metadata for your files: see how PhotoStructure extracts just the captured-at field as an example.

Data stability and portability

Several users have suggested that PhotoStructure should store extracted metadata along with the asset.

Ideally there’d be a way for me to push this work back into the original asset, but in a lossless way (so if there were bugs in parsing or inference, the bugfix could be applied and patch up the damage from the prior bug).

If this work was stored with the asset:

  1. Subsequent sync runs wouldn’t need to re-do that work
  2. If anything around the asset file changes (say, sidecars are deleted or added), PhotoStructure has more metadata to play with if it can remember what prior values were.
  3. People that downgrade from PLUS back to LITE don’t lose data.

Implementation goals

  1. To make the data “portable,” we need to store the final value of a given tag in a standard place.

  2. We also need to retain the prior original metadata so we can repair or improve fields as bugs are fixed and improvements added to parsing and inference. This can be “proprietary” to PhotoStructure (and indeed I believe it will have to be, as there doesn’t seem to be a standard field that holds this sort value).

:onion:Implementation and schema

This section is a work in progress

PhotoStructure already applies sidecar metadata to your assets via “layering.”

Normally, the “topmost” layer with a value wins, but, as discussed before, heuristics added in the future may consider lower level layers from different sources more favorably to fix or improve data extraction.

What’s in a layer?

Each layer contains:

  1. Data: a set of typed key/value pairs, stored with native keys and typed values
  2. Metadata about the layer:
    a. the URI to the source file (optional)
    b. when the layer was produced
    c. the curator name and version that produced the layer
    d. any other information the curator may need in the future to restore prior context (optional)

Layers aren’t always additive

In editing keywords for an asset, you may want to delete prior-added keywords. If you edit a variation of the asset externally, PhotoStructure may then consider the variation as the “primary,” and your keywords edit will be reverted inadvertently.

This can be handled gracefully by having an “edit” layer that contains time of edit, the edited field, the previous value, and the new value. This enables arbitrary metadata editing.

Layer storage

When we overwrite prior data or generate novel metadata, we’ll push a new “layer” onto the array of prior layers stored with an asset.

This stack of layers will then be encoded as JSON and stored with the asset. Sidecars can contain layers as well, and the final layer array is via merging and deduping.

For existing tags, see XMP and IPTC), but we can have a Setting that specifies where we store this “prior” metadata. It seems like IPTC:DocumentNotes and IPTC:ExifCameraInfo are generic enough.

Example

[
  {
    "ImageWidth": 12345,
    "SubSecDateTimeOriginal": "2003:01:26 15:58:17",
    "meta": {
      "src": "/path/to/img.jpg",
      "v": "1.0.1", // < version of PhotoStructure that did this parsing 
      "at": 1626199509740 // < when the parsing happened. May not be needed.  
    }
  },
  {
    "albumData.title": "Album Name",
    "albumData.description": "Album Description",
    "meta": {
      "src": "/path/to/metadata.json",
      "v": "1.0.1", 
      "at": 1626199509740      
    }
  }
]
1 Like

The ability to take the “combined” metadata and push back to all “dupes” so that they are in sync with metadata would be desirable to me also… maybe that’s implied in what you said above, but, for example I have JPEG+RAW pairs where a person got tagged in the JPEG, but not the RAW. The pair gets detected as duplicates and all presented, but would be nice to push those tags to bring both in sync.

Ah: so you want the tags added to both the RAW and the JPEG files?

FWIW, sidecars for “image.JPG” are frequently named “image.XMP”, which will also match as a sidecar for “image.CR2” (which means you don’t need to double up the sidecars).

PhotoStructure defaults to writing to “image.JPG.XMP” to avoid issues with large directories where “IMG_0001.JPG” and “IMG_0001.CR2” may actually be different images.

Like everything, there are probably so many permutations here. For the example JPEG+RAW pair, then a shared XMP file would be fine (I prefer the sidecar for RAW anyhow), but what if I don’t store my RAW in the same folder as JPEG (many people separate them)?

Or for whatever reason I do have multiple JPEGs that are “dupes” in PS, stored all over the place…

Whether that means multiple XMP files or writing to all the files or whatever…

I’m mostly just proposing at a high level a button or option that says we bring all the metadata “in line” - or at least the tags, faces, who, etc. Obviously some metadata (filetype, resolution, etc.) is unique per picture.

I like the idea of storing the metadata I’ve spent a lot of time creating (like face tagging and geocoding) to be inside the picture itself and not locked on some application specific database. In fact, I am waiting impatiently for those features to be included in photostructure (tag, metadata editing). But I wonder if this something that perhaps wshould only be done on the “plus” copy in the library and leave the original in the source folder as-is?

Maybe if we want to push metadata to the source copies it should only be through sidecars?

Or, like everything else, when in doubt, let the user chose through settings.

This is my preference. I really want Photostructure to write to the image’s EXIF tags directly, as I’m not interested in managing sidecar files.

Edit: I moved this answer here:

So I may be unclear on something… my understanding is that Photostructure doesn’t write any metadata at this point. It’s only once we have the ability to edit tags that this will be relevant. Do I have that wrong?

When you rotate assets, it writes Orientation as a tag to the variants of the current asset, and that normally gets emitted to a sidecar.

(and it figures out what the correct orientation to result in the final image for every variant, because I had a bunch of copies of images that all had different orientations)

But you’re correct, this is the only metadata that PhotoStructure edits currently. I’m happy with this new layer design, though, so as soon a v1.0 ships, I can build that out.

I understand that rotation is the only thing that photostructure UI can edit at this time, but does it write all the metadata extracted through other means (like EXIF) back into a sidecar?

No: it’s an append operation, so if the sidecar already exists, Orientation will be added (or overwrite a prior value) in the existing file.

If the sidecar doesn’t exist, a very small .XMP file is added next to the original asset.