PhotoStructure should be able to automatically classify image contents and assign those tags.
One of the larger tasks here is to come up with a reasonable default set of terms, where each term fits into a heirarchy, or âtaxonomy.â Examples could be what/scene/beach, what/animal/cat, what/object/car.
If youâve seen freely licensed taxonomies or ontologies, or have other suggestions, please reply!
Imagenet has a set of terms, but itâs pretty horrible for actually end user use because itâs both extremely (and oddly) specific fora bunch of species, and glaringly missing very common items.
Another idea (not free) Iâll throw out there: integrate Googleâs Cloud Vision API. This definitely would be for more advanced users (like myself) willing to setup GCP stuff since each individual user would have to provide their own Google API keys and pay Google for usage. This feature could be in addition to something implemented locally, letâs say for more âadvancedâ detection.
That may be something Iâd be willing to pay for on an initial load, or on demand (on a given file and/or folder) or just leverage the 1000 free images each month. For example I have vacation pics in front of landmarks and I canât seem to remember what those landmarks are and no GPS coordinates available - Google has an API for landmark detection. I might like to âask Googleâ from within PS.
Interesting: Iâve toyed with supporting âexternalâ tag curation: itâd pass in a path to a photo or video (and possibly any metadata that PhotoStructure has already collected) , and return JSON that complies with some given schema (at least supporting a set of tags to add to the asset, but other bits may be interesting as well).
Places: Home, Building, School
Nature: River, Lake, ocean
Activities: Band, Concert, Dog Show (donât ask)
As for methodologies have you considered knowledge transfer training and sharing? We could keep it all in the PS community where, if we agree, we can contribute to a shared ML model/models.
These all take different approaches but the common theme is shared knowledge and model tuning. You could have a shortcut where instead of combining models you just use multiple models and you have it loop through models until it finds a high confidence level. Then have one model update another locally.
I like the idea of a PS central model but you would be doing something paper worthy
I was going to use mobilenet models, but CLIP looks super promising (and doesnât have the taxonomy that mobilenet has that is applicable towards model research, rather than consumer use, or require transfer learning to train to a new taxonomy).
I just replaced my prior comment with a link to a google spreadsheet to take more suggestions.
Note that CLIP may or may not be able to actually apply these topics.
It seems to read that CLIP only knows what they train it to know. Would there be a way to train a local model against CLIP first and then use CLIPâs confidence to have the user go through and tell the local model what it is if CLIP is less than 70%? Sort of knowledge transfer by proxy combined with human lead reinforcement training.
I personally would be willing to share my model with the community whenever PS has this so if a hybrid approach is possible then count me in for sharing.
My understanding is that the grammar CLIP was trained on is gigantic. You feed the whole taxonomy into the model along with an image and get back likelihoods.
(I havenât played around with it at all, so this is very sketchy)
Thanks for the offer for sharing model training results! If I enable that, Iâd make that opt-in only, of course.
I mean absolutely no disrespect by this statement, but I canât stop laughing at the fact that your animal list has only 9 specific(ish) elements.
I do not envy whoever has to curate such a list, but it does make me curious if it would be practical to set up some kind of wiki that could be publicly curated with some form of moderation.
Just started playing around with PS today and so far itâs great. The âtaggingâ feature would be extremely helpful for people like me with tens of thousands of photos. Currently I use plex for hosting my images (long and frustrating story, Iâve had enough). PS seems like the best solution for me to get away from plex managing my libraries.
I was messing around with DeepStack last weekend and thought it did a great job at object, scene and face detection out of the box. Additionally, the api has hooks to âtrainâ the ML to an unknown face, object or scene. I trained it to my face and wrote a script to go through one of my libraries (about 5k pictures), and identify each object in each picture as well as my face (if it was there). It ran overnight and the next morning every photo was tagged in Plex.
Overall, it seems like a powerful open source tool which is worth looking into. It can all be hosted locally and can run on the cpu or gpu(nvidia).
I appreciate all the work you are putting into PS and will continue to test it out along the way!
Iâm really looking forward on seeing how automatic tagging progresses, especially with the use of CLIP.
I found a maintained repository that uses CLIP in the command line to identify photos based on content and appearance and so was hoping it could be of some use to you.