All assets copied to the Library, but fewer shown on UI

Hey,

I continue to experiment with “organization” aspect of the PS. I have my assets (jpeg, png, mov and mv4) sitting on NAS in one messy folder. There are 526 assets. I also set very “aggressive” ENV VARS:

PS_MIN_ASSET_FILE_SIZE_BYTES=1000
PS_MIN_IMAGE_DIMENSION=10
PS_MIN_VIDEO_DURATION_SEC=1

To make sure almost nothing is left behind. I let Ubuntu Node PLUS alpha.3 version do its magic and copy all to Library directory.

I check how many files are in YYYY/YYYY-MM directory structure in the library

~/Pictures$ find 20* -type f | wc -l
526

and the numbers match - 526. (20* is to perform counts in YYYY dirs that start from 20, and exclude .photostructure dir from count)

Now, when I look on UI it shows 484 assets.

What other option (aside from going dir by dir and compare what is missing under that month) I have to figure this out?

Thanks,
Konstantin.

So: there’s either an issue with tag counts, or you’ve got a bunch of duplicates.

If you open the About page and scroll down to the library metrics, do those numbers match up with the tag counts on the home page?

If they do, then PhotoStructure thinks you’ve got duplicates. The easiest way to find those duplicates is to either

  1. If you’re using PhotoStructure for Servers, use the list tool

or

  1. If you’re on PhotoStructure for Desktops, you can open your library database directly with a tool and see what’s going on.

For this, install DB Browser for SQLite (it’s free and open source: consider donating!)

Then open your PhotoStructure library database (living in $library/.photostructure/models/db.sqlite3)

Click the Execute SQL tab, and enter

SELECT
  URI
FROM
  AssetFile
  JOIN Asset ON Asset.id = AssetFile.assetId
WHERE
  Asset.shown = 1
  AND AssetFile.shown = 0

These are all the AssetFiles that are associated to an imported Asset and not the “primary variation” (the one that’s “shown” in the asset view).

If you want to only show those files that are in your library, and what to see all AssetFile columns, change the SELECT to * and add AssetFile.uri LIKE 'pslib:%' to the WHERE clause:

SELECT
  *
FROM
  AssetFile
  JOIN Asset ON Asset.id = AssetFile.assetId
WHERE
  Asset.shown = 1
  AND AssetFile.shown = 0
  AND AssetFile.uri LIKE 'pslib:%'

Right… I forgot about de-dup. Library metrics match to what I see: 488 assets (it changed from 484 after restart :man_shrugging:) and (image files + video files)/2 = 526.

Now, I need to play with list tool a bit more to get the list of assets that were considered duplicates, as

./photostructure list --where "Asset.shown=1 AND AssetFile.shown=0"

gives me the list of all 526 assets in NAS dir and 38 assets in Library dir, but I have no idea how to “match” them. 526-38=488 that means one can hope there are 38 duplicates and there is only 1 duplicate per asset and there are no other issues. :slight_smile:

I looked at photostructure-for-servers/defaults.env at main · photostructure/photostructure-for-servers · GitHub hoping to find a way to turn off deduping completely (crazy, eh), but could not find it. Flip PS_STRICT_DEDUPING to “true”?

To be honest, I am probably fine to let PS now do this huge amount of work to sort out thousand of assets backed up from various sources and have “her” sort them out in a neat folder structure. It’s just my OCD kicks in every time I see my numbers mismatch.

K

Update: a bit more SQLing and I was able to find the offenders. iPhone selfie burst mode is an outlaw: creating 10 identical photos… These are truly duplicates. I think I can construct a query to make it work and find what’s going on.

K

Thanks for the update. :+1:

Always feel free to send me incorrectly-deduplicated image sets so I can adjust the heuristics accordingly.

Sorry: there isn’t currently a way to completely disable deduping.

Just emailed couple photos that were de-dupped too aggressive.

K

This is my “lame-o” SQL query to show all assets and files URIs, that were de-dupped. They grouped and ordered by assetId. This way I can analyze dups and delete them. Not ideal and I need to test this on another set of assets, but it is a start.

SELECT id, assetId, uri from AssetFile WHERE  assetId in (
SELECT
  Asset.id
FROM
  AssetFile
  JOIN Asset ON Asset.id = AssetFile.assetId
WHERE
  Asset.shown = 1
  AND AssetFile.shown = 0
  AND AssetFile.uri like 'pslib:%' group by AssetFile.assetId
)
and uri like 'psfile:%' order by assetId;

K