TL;DR:
PhotoStructure applies a ton of heuristics when it extracts metadata from assets.
This work should be stored so that if you stop using PhotoStructure, this metadata is still available.
Metadata? Heuristics?
PhotoStructure has very robust heuristics to pull in metadata for your files: see how PhotoStructure extracts just the captured-at field as an example.
Data stability and portability
Several users have suggested that PhotoStructure should store extracted metadata along with the asset.
Ideally there’d be a way for me to push this work back into the original asset, but in a lossless way (so if there were bugs in parsing or inference, the bugfix could be applied and patch up the damage from the prior bug).
If this work was stored with the asset:
- Subsequent
sync
runs wouldn’t need to re-do that work - If anything around the asset file changes (say, sidecars are deleted or added), PhotoStructure has more metadata to play with if it can remember what prior values were.
- People that downgrade from PLUS back to LITE don’t lose data.
Implementation goals
-
To make the data “portable,” we need to store the final value of a given tag in a standard place.
-
We also need to retain the prior original metadata so we can repair or improve fields as bugs are fixed and improvements added to parsing and inference. This can be “proprietary” to PhotoStructure (and indeed I believe it will have to be, as there doesn’t seem to be a standard field that holds this sort value).
Implementation and schema
This section is a work in progress
PhotoStructure already applies sidecar metadata to your assets via “layering.”
Normally, the “topmost” layer with a value wins, but, as discussed before, heuristics added in the future may consider lower level layers from different sources more favorably to fix or improve data extraction.
What’s in a layer?
Each layer
contains:
- Data: a set of typed key/value pairs, stored with native keys and typed values
-
Metadata about the layer:
a. the URI to the source file (optional)
b. when the layer was produced
c. the curator name and version that produced the layer
d. any other information the curator may need in the future to restore prior context (optional)
Layers aren’t always additive
In editing keywords for an asset, you may want to delete prior-added keywords. If you edit a variation of the asset externally, PhotoStructure may then consider the variation as the “primary,” and your keywords edit will be reverted inadvertently.
This can be handled gracefully by having an “edit” layer that contains time of edit, the edited field, the previous value, and the new value. This enables arbitrary metadata editing.
Layer storage
When we overwrite prior data or generate novel metadata, we’ll push a new “layer” onto the array of prior layers stored with an asset.
This stack of layers will then be encoded as JSON and stored with the asset. Sidecars can contain layers as well, and the final layer array is via merging and deduping.
For existing tags, see XMP and IPTC), but we can have a Setting that specifies where we store this “prior” metadata. It seems like IPTC:DocumentNotes
and IPTC:ExifCameraInfo
are generic enough.
Example
[
{
"ImageWidth": 12345,
"SubSecDateTimeOriginal": "2003:01:26 15:58:17",
"meta": {
"src": "/path/to/img.jpg",
"v": "1.0.1", // < version of PhotoStructure that did this parsing
"at": 1626199509740 // < when the parsing happened. May not be needed.
}
},
{
"albumData.title": "Album Name",
"albumData.description": "Album Description",
"meta": {
"src": "/path/to/metadata.json",
"v": "1.0.1",
"at": 1626199509740
}
}
]