OK, circling back (thanks for the reminder on Discord!
There are a bunch of new features/changes in v23.8 to handle these cases and help debug them in the future:
- The new
assetAggregation
setting, which should avoid the a-matches-b-matches-c-but-a-doesn’t-match-c nondeterminism issue.
# ----------------------------------------
# PS_ASSET_AGGREGATION or assetAggregation
# ----------------------------------------
#
# How should assets be aggregated?
#
# - "union" will allow asset file variants to join an asset if they match
# *any* existing variant.
#
# - "intersection" will only allow asset file variants to join an asset if
# they match *all* existing variants.
#
# Versions prior to 23.8 defaulted to "union" behavior.
- The new
allowFuzzyDateImageHashMatches
setting, which will allowstat
-based captured-at photos to try to match against similar images. Note that the default isfalse
, though, as it will cause scanned images to be aggregated possibly too aggressively. You’ll want to set this totrue
.
# ------------------------------------------------------------------------
# PS_ALLOW_FUZZY_DATE_IMAGE_HASH_MATCHES or allowFuzzyDateImageHashMatches
# ------------------------------------------------------------------------
#
# For images that don't have a reliable precise captured-at time (say, from
# "stat" or datestamp from pathname), can we aggregate assets purely by exact
# image hash matches?
#
# See https://forum.photostructure.com/t/deduplicate-shenanigans/1732/11 for
# more details.
- The
info
tool now handles more than 2 files, by automatically adding a “clusters” field, that lets you try out different deduping settings:
$ ./photostructure info $(find '/tmp/nuk' -type f) --filter clusters
{
clusters: [
[ 'psfile://2NMQsMVCK/tmp/nuk/20200427_225616.jpg' ],
[ 'psfile://2NMQsMVCK/tmp/nuk/20200427_225538.jpg' ],
[ 'psfile://2NMQsMVCK/tmp/nuk/review/806525515_15743.jpg' ],
[ 'psfile://2NMQsMVCK/tmp/nuk/806525515_15743.jpg' ],
[ 'psfile://2NMQsMVCK/tmp/nuk/107918631_157209.jpg' ]
]
}
$ allowFuzzyDateImageHashMatches=1 ./photostructure info $(find '/tmp/nuk' -type f) --filter clusters
{
clusters: [
[
'psfile://2NMQsMVCK/tmp/nuk/20200427_225616.jpg',
'psfile://2NMQsMVCK/tmp/nuk/review/806525515_15743.jpg',
'psfile://2NMQsMVCK/tmp/nuk/806525515_15743.jpg'
],
[
'psfile://2NMQsMVCK/tmp/nuk/20200427_225538.jpg',
'psfile://2NMQsMVCK/tmp/nuk/107918631_157209.jpg'
]
]
}
(so I believe that aggregation is what you’re expecting, correct?)