I have recently started using PhotoStructure to manage my growing collection of digital photos; and I am impressed with its capabilities so far. Although; I have a few questions about best practices for organizing and optimizing large photo libraries within the platform.
My current photo library consists of approximately 50,000 images; spread across multiple folders and external drives. I am particularly interested in understanding how PhotoStructure handles metadata and duplicate detection in such a large dataset. Are there specific settings or features I should be aware of to ensure that the metadata is accurately read and synced across all images?
I have read that PhotoStructure can automatically categorize and tag photos. How well does this feature perform with a diverse collection of images; including those with various file formats and resolutions? Are there any tips for improving the accuracy of automatic tagging and categorization?
Another aspect I am curious about is the performance of PhotoStructure when dealing with a substantial amount of data. Are there any known issues or recommendations for maintaining optimal performance and avoiding potential slowdowns as the library continues to grow?
Photostructure has no automated tagging capabilities.
There are a number of settings you can use to “tweak” how photostructure interprets tags embedded in your picture or sidecars. The defaults are pretty good, so you only really need to do worry about it if you’re not seeing the results you were expecting. Please refer to this document:
Within the page above, there is a link to this document that lists all the possible configurations. Of particular interest are the Library.deduping, Library.parsing, Library.tagging sections.
No integrations I am aware of, but photostructure aims to play well with others. Many of us use photostructure in conjunction with other tools. I personally use digikam to manage the tagging of my photo library. TagMyPhoto is also frequently mentioned.
In my experience, modifying image meta after import is a mess unless you disconnect your source volumes from photostructure first. Otherwise, ps will notice the difference between the original image and a file with updated metadata…and import the original again.
My 190,000 image collection is rife with photostructure-induced duplicates. I’m going to have to wipe and re-import the entire thing again, but I’m waiting to do it over my next vacation since the job takes about 10 days.
This particular problem is enough for me to find an alternative to photostructure, but there’s nothing that comes close enough to it to be an improvement. PS is pretty good, but I struggle massively with integrated workflows.
Yes, that would be a problem. Eventhough I am a plus subscriber, I don’t use the “copy” feature. So for me the source volume (which I mount read-only) is the only copy of the files, and of course, that’s what I continue to manage/edit outside of PS (I use digiKam as my DAM). So I have no issues with duplicates getting created in PS.
I do wonder though if this is a “feature” or a “bug”. Sounds like something @mrm should have an answer to.
By design, every unique file SHA gets copied into the library.
With the next release, I’m adding an “image data SHA,” which doesn’t change if there are metadata changes. I could skip copying into the library if there’s a variation with the same image data SHA already, but that means I may not be copying the “latest” image variation (say, if you fixed the orientation of an image—you’d want the want with the right rotation!)
Alternatively, I think the amount of code I’d have to write is pretty small to only keep the “primary” copy in your library for every asset—and that would address your duplication problem, correct?
I’d be very concerned that PhotoStructure is doing a perfect job of picking which of your variations is truly the “primary” variation, though. Have you ever found it has picked the wrong one? (If so: please send me the variations and which you think should “win,” and I can see what’s going on)