PhotoStructure should be able to automatically classify image contents and assign those tags.
One of the larger tasks here is to come up with a reasonable default set of terms, where each term fits into a heirarchy, or “taxonomy.” Examples could be what/scene/beach, what/animal/cat, what/object/car.
If you’ve seen freely licensed taxonomies or ontologies, or have other suggestions, please reply!
Imagenet has a set of terms, but it’s pretty horrible for actually end user use because it’s both extremely (and oddly) specific fora bunch of species, and glaringly missing very common items.
Another idea (not free) I’ll throw out there: integrate Google’s Cloud Vision API. This definitely would be for more advanced users (like myself) willing to setup GCP stuff since each individual user would have to provide their own Google API keys and pay Google for usage. This feature could be in addition to something implemented locally, let’s say for more “advanced” detection.
That may be something I’d be willing to pay for on an initial load, or on demand (on a given file and/or folder) or just leverage the 1000 free images each month. For example I have vacation pics in front of landmarks and I can’t seem to remember what those landmarks are and no GPS coordinates available - Google has an API for landmark detection. I might like to “ask Google” from within PS.
Interesting: I’ve toyed with supporting “external” tag curation: it’d pass in a path to a photo or video (and possibly any metadata that PhotoStructure has already collected) , and return JSON that complies with some given schema (at least supporting a set of tags to add to the asset, but other bits may be interesting as well).
These all take different approaches but the common theme is shared knowledge and model tuning. You could have a shortcut where instead of combining models you just use multiple models and you have it loop through models until it finds a high confidence level. Then have one model update another locally.
I like the idea of a PS central model but you would be doing something paper worthy
I was going to use mobilenet models, but CLIP looks super promising (and doesn’t have the taxonomy that mobilenet has that is applicable towards model research, rather than consumer use, or require transfer learning to train to a new taxonomy).
I just replaced my prior comment with a link to a google spreadsheet to take more suggestions.
Note that CLIP may or may not be able to actually apply these topics.
It seems to read that CLIP only knows what they train it to know. Would there be a way to train a local model against CLIP first and then use CLIP’s confidence to have the user go through and tell the local model what it is if CLIP is less than 70%? Sort of knowledge transfer by proxy combined with human lead reinforcement training.
I personally would be willing to share my model with the community whenever PS has this so if a hybrid approach is possible then count me in for sharing.
Just started playing around with PS today and so far it’s great. The “tagging” feature would be extremely helpful for people like me with tens of thousands of photos. Currently I use plex for hosting my images (long and frustrating story, I’ve had enough). PS seems like the best solution for me to get away from plex managing my libraries.
I was messing around with DeepStack last weekend and thought it did a great job at object, scene and face detection out of the box. Additionally, the api has hooks to “train” the ML to an unknown face, object or scene. I trained it to my face and wrote a script to go through one of my libraries (about 5k pictures), and identify each object in each picture as well as my face (if it was there). It ran overnight and the next morning every photo was tagged in Plex.
Overall, it seems like a powerful open source tool which is worth looking into. It can all be hosted locally and can run on the cpu or gpu(nvidia).
I appreciate all the work you are putting into PS and will continue to test it out along the way!