Automatic image content tagging / classification / annotations: the 'What` tag

PhotoStructure should be able to automatically classify image contents and assign those tags.

One of the larger tasks here is to come up with a reasonable default set of terms, where each term fits into a heirarchy, or “taxonomy.” Examples could be what/scene/beach, what/animal/cat, what/object/car.

If you’ve seen freely licensed taxonomies or ontologies, or have other suggestions, please reply!

Imagenet has a set of terms, but it’s pretty horrible for actually end user use because it’s both extremely (and oddly) specific fora bunch of species, and glaringly missing very common items.

What would you expect to search for?

“Beach”? “Mountains” or “clouds”?

“Baby”? “Face”?

Dogs/cats/other common animals: GitHub - ypwhs/dogs_vs_cats: 猫狗大战

Recognize facial expressions: GitHub - amineHorseman/facial-expression-recognition-using-cnn: Deep facial expressions recognition us

Search for cupcakes vs waffles and other food: GitHub - stratospark/food-101-mobile: Deep Learning Food Classifier for iOS using Keras and Tensorflow

More food classifications: GitHub - abhinavsagar/Grocery-Product-Classification: Implementation of the paper "A Hierarchical

Find photos that have been edited: GitHub - agusgun/FakeImageDetector: Image Tampering Detection using ELA and CNN

Is someone wearing a hard hat? GitHub - kjaisingh/hardhat-detector: A convolutional neural network implementation of a script that de

Search for colors: GitHub - mahmoudnafifi/WB_color_augmenter: WB color augmenter improves the accuracy of image classific

This might be useful for a feature down the line to autoblur nude content or hide them or classify them in the gallery views: nude.js | Nudity detection with JavaScript and HTMLCanvas

Logo and brand recognition for when I want to pull up all my photos of my Ford Fiesta: GitHub - PaddlePaddle/PaddleClas: A treasure chest for visual recognition powered by PaddlePaddle</tit

Find similar photos: GitHub - victorqribeiro/groupImg: A script in python to organize your images by similarity.

Perhaps some inspiration on how to setup your classification process: Image-ATM

Another idea (not free) I’ll throw out there: integrate Google’s Cloud Vision API. This definitely would be for more advanced users (like myself) willing to setup GCP stuff since each individual user would have to provide their own Google API keys and pay Google for usage. This feature could be in addition to something implemented locally, let’s say for more “advanced” detection.

That may be something I’d be willing to pay for on an initial load, or on demand (on a given file and/or folder) or just leverage the 1000 free images each month. For example I have vacation pics in front of landmarks and I can’t seem to remember what those landmarks are and no GPS coordinates available - Google has an API for landmark detection. I might like to “ask Google” from within PS.

1 Like

Interesting: I’ve toyed with supporting “external” tag curation: it’d pass in a path to a photo or video (and possibly any metadata that PhotoStructure has already collected) , and return JSON that complies with some given schema (at least supporting a set of tags to add to the asset, but other bits may be interesting as well).

That sounds like a potential plugin system :wink:
I’d consider writing a plugin like a Google Vision API curator :slight_smile:

Here’s a first whack at a taxonomy:

Not sure if feasible, but suggest a generic ‘event’ event… in case you can’t tell what kind of party.

Same for animals.

A few others:

Places: Home, Building, School
Nature: River, Lake, ocean
Activities: Band, Concert, Dog Show (don’t ask)

As for methodologies have you considered knowledge transfer training and sharing? We could keep it all in the PS community where, if we agree, we can contribute to a shared ML model/models.

These all take different approaches but the common theme is shared knowledge and model tuning. You could have a shortcut where instead of combining models you just use multiple models and you have it loop through models until it finds a high confidence level. Then have one model update another locally.

I like the idea of a PS central model but you would be doing something paper worthy :slight_smile:

I was going to use mobilenet models, but CLIP looks super promising (and doesn’t have the taxonomy that mobilenet has that is applicable towards model research, rather than consumer use, or require transfer learning to train to a new taxonomy).

I just replaced my prior comment with a link to a google spreadsheet to take more suggestions.

Note that CLIP may or may not be able to actually apply these topics.

So what gets sent to CLIP in this model?

It seems to read that CLIP only knows what they train it to know. Would there be a way to train a local model against CLIP first and then use CLIP’s confidence to have the user go through and tell the local model what it is if CLIP is less than 70%? Sort of knowledge transfer by proxy combined with human lead reinforcement training.

I personally would be willing to share my model with the community whenever PS has this so if a hybrid approach is possible then count me in for sharing.

My understanding is that the grammar CLIP was trained on is gigantic. You feed the whole taxonomy into the model along with an image and get back likelihoods.

(I haven’t played around with it at all, so this is very sketchy)

Thanks for the offer for sharing model training results! If I enable that, I’d make that opt-in only, of course.

Not having soccer on your list of activities is borderline offensive! :wink:

There’s “American football” and “football,” but I just added soccer as well. Duplicate topics like this may cause issues, though:

Soccer Fail GIF

I mean absolutely no disrespect by this statement, but I can’t stop laughing at the fact that your animal list has only 9 specific(ish) elements.

I do not envy whoever has to curate such a list, but it does make me curious if it would be practical to set up some kind of wiki that could be publicly curated with some form of moderation.

The taxonomy is not meant to be comprehensive! It needs to balance between:

  • objects that will be fairly common in many photo libraries
  • themes that people would search for

i suspect that larger the taxonomy, the less coverage and accuracy we’ll see, and the slower the model application will be.

I’m expecting it to be user-configurable, as well.

Suggestions are welcome.

In looking at how you have your current taxonomy setup, I had an idea for you.

Some guy at Apple one said, great artists steal, and another guy at Microsoft copied, or maybe it was the other way around. Anyway…

Classifying Images with Vision and Core ML Apple Developer Documentation

Apple Photos Models: Models - Machine Learning - Apple Developer

Google’s crash course: Machine Learning Crash Course  |  Google Developers

Taxonomy Google used in 2019:

TensorFlow official models: models/official at master · tensorflow/models · GitHub

Another program to consider is Machine Box. With it you can do tag objects, faces, etc. It might be something to consider, incorporating their API.