Deduplicate shenanigans

OK, circling back (thanks for the reminder on Discord!

There are a bunch of new features/changes in v23.8 to handle these cases and help debug them in the future:

  1. The new assetAggregation setting, which should avoid the a-matches-b-matches-c-but-a-doesn’t-match-c nondeterminism issue.
# ----------------------------------------
# PS_ASSET_AGGREGATION or assetAggregation
# ----------------------------------------
#
# How should assets be aggregated?
#
# - "union" will allow asset file variants to join an asset if they match
# *any* existing variant.
#
# - "intersection" will only allow asset file variants to join an asset if
# they match *all* existing variants.
#
# Versions prior to 23.8 defaulted to "union" behavior.
  1. The new allowFuzzyDateImageHashMatches setting, which will allow stat-based captured-at photos to try to match against similar images. Note that the default is false, though, as it will cause scanned images to be aggregated possibly too aggressively. You’ll want to set this to true.
# ------------------------------------------------------------------------
# PS_ALLOW_FUZZY_DATE_IMAGE_HASH_MATCHES or allowFuzzyDateImageHashMatches
# ------------------------------------------------------------------------
#
# For images that don't have a reliable precise captured-at time (say, from
# "stat" or datestamp from pathname), can we aggregate assets purely by exact
# image hash matches?
#
# See https://forum.photostructure.com/t/deduplicate-shenanigans/1732/11 for
# more details.
  1. The info tool now handles more than 2 files, by automatically adding a “clusters” field, that lets you try out different deduping settings:
$ ./photostructure info $(find '/tmp/nuk' -type f) --filter clusters
{
  clusters: [
    [ 'psfile://2NMQsMVCK/tmp/nuk/20200427_225616.jpg' ],
    [ 'psfile://2NMQsMVCK/tmp/nuk/20200427_225538.jpg' ],
    [ 'psfile://2NMQsMVCK/tmp/nuk/review/806525515_15743.jpg' ],
    [ 'psfile://2NMQsMVCK/tmp/nuk/806525515_15743.jpg' ],
    [ 'psfile://2NMQsMVCK/tmp/nuk/107918631_157209.jpg' ]
  ]
}

$ allowFuzzyDateImageHashMatches=1 ./photostructure info $(find '/tmp/nuk' -type f) --filter clusters
{
  clusters: [
    [
      'psfile://2NMQsMVCK/tmp/nuk/20200427_225616.jpg',
      'psfile://2NMQsMVCK/tmp/nuk/review/806525515_15743.jpg',
      'psfile://2NMQsMVCK/tmp/nuk/806525515_15743.jpg'
    ],
    [
      'psfile://2NMQsMVCK/tmp/nuk/20200427_225538.jpg',
      'psfile://2NMQsMVCK/tmp/nuk/107918631_157209.jpg'
    ]
  ]
}

(so I believe that aggregation is what you’re expecting, correct?)