OK, circling back (thanks for the reminder on Discord!
There are a bunch of new features/changes in v23.8 to handle these cases and help debug them in the future:
- The new
assetAggregationsetting, which should avoid the a-matches-b-matches-c-but-a-doesn’t-match-c nondeterminism issue.
# ----------------------------------------
# PS_ASSET_AGGREGATION or assetAggregation
# ----------------------------------------
#
# How should assets be aggregated?
#
# - "union" will allow asset file variants to join an asset if they match
# *any* existing variant.
#
# - "intersection" will only allow asset file variants to join an asset if
# they match *all* existing variants.
#
# Versions prior to 23.8 defaulted to "union" behavior.
- The new
allowFuzzyDateImageHashMatchessetting, which will allowstat-based captured-at photos to try to match against similar images. Note that the default isfalse, though, as it will cause scanned images to be aggregated possibly too aggressively. You’ll want to set this totrue.
# ------------------------------------------------------------------------
# PS_ALLOW_FUZZY_DATE_IMAGE_HASH_MATCHES or allowFuzzyDateImageHashMatches
# ------------------------------------------------------------------------
#
# For images that don't have a reliable precise captured-at time (say, from
# "stat" or datestamp from pathname), can we aggregate assets purely by exact
# image hash matches?
#
# See https://forum.photostructure.com/t/deduplicate-shenanigans/1732/11 for
# more details.
- The
infotool now handles more than 2 files, by automatically adding a “clusters” field, that lets you try out different deduping settings:
$ ./photostructure info $(find '/tmp/nuk' -type f) --filter clusters
{
clusters: [
[ 'psfile://2NMQsMVCK/tmp/nuk/20200427_225616.jpg' ],
[ 'psfile://2NMQsMVCK/tmp/nuk/20200427_225538.jpg' ],
[ 'psfile://2NMQsMVCK/tmp/nuk/review/806525515_15743.jpg' ],
[ 'psfile://2NMQsMVCK/tmp/nuk/806525515_15743.jpg' ],
[ 'psfile://2NMQsMVCK/tmp/nuk/107918631_157209.jpg' ]
]
}
$ allowFuzzyDateImageHashMatches=1 ./photostructure info $(find '/tmp/nuk' -type f) --filter clusters
{
clusters: [
[
'psfile://2NMQsMVCK/tmp/nuk/20200427_225616.jpg',
'psfile://2NMQsMVCK/tmp/nuk/review/806525515_15743.jpg',
'psfile://2NMQsMVCK/tmp/nuk/806525515_15743.jpg'
],
[
'psfile://2NMQsMVCK/tmp/nuk/20200427_225538.jpg',
'psfile://2NMQsMVCK/tmp/nuk/107918631_157209.jpg'
]
]
}
(so I believe that aggregation is what you’re expecting, correct?)