More settings to control imported keywords

According to PhotoStructure | How does PhotoStructure extract keywords from my photos and videos? there is a list of EXIF tags that are being imported

  • CatalogSets
  • Categories (this is typically XML-encoded)
  • HierarchicalSubject
  • Keywords
  • LastKeywordXMP
  • Subject
  • TagsList
  • XPKeywords (these are keywords added by the Windows Explorer)

I would like to be able to control which tags PhotoStructure imports. Actually I need only HierarchicalSubject as it is the most accurate piece of data. I use Adobe Lightroom Classic to organize my assets library and it sets HierarchicalSubject as a primary source and makes flat copies into Subject and Keywords tags. However IPTC:Keywords has limitation for 64 characters IPTC Tags

So if you have HierarchicalSubject = a | b | c | Xd, where X is a 64 characters string, then you will have flat structure Subject = a, b, c, Xd and Keywords = a, b, c, X (note that d is trimmed from Xd word, because Xd is longer than allowed 64 characters)

I guess, that PhotoStucture tries to deduce hierarchical tags from the flat structure, so when it sees Xd in the Subject tag it realizes it is a part of the hierarchical structure kw:a/b/c/Xd

But when it sees trimmed tag X from Keywords tag it cannot deduce its place from the hierarchical structure and therefore it creates kw:X

Then I see kw:X in the list of the PhotoStructure tags and it annoys me, as I want to see only my well-structured kw:a/b/c/Xd

So if I would have a way to configure PhotoStructure to stop parsing Keywords tags, it would solve this annoying issue I have

1 Like

This is 3 lines of code. It’ll be done in alpha.7 :+1:

# +---------------+
# |  keywordTags  |
# +---------------+
#
# PhotoStructure should look in the following tags for keywords. Note that
# these values are case-sensitive.
# (env: "PS_KEYWORD_TAGS")
#
keywordTags = [
  "CatalogSets",
  "Categories",
  "HierarchicalSubject",
  "Keywords",
  "LastKeywordXMP",
  "Subject",
  "TagsList",
  "XPKeywords"
]
2 Likes

If you can test with alpha.7 and verify that this addresses this concern, that’d be great!

1 Like

(Oops: I thought closing a topic was just cosmetic: I didn’t realize it prevented subsequent replies!)

1 Like

The characters that are used to see if a given keyword is actually hierarchical is via the keywordPathSeparators setting:

# +-------------------------+
# |  keywordPathSeparators  |
# +-------------------------+
#
# PhotoStructure interprets keywords as hierarchical if a path separator
# character is found in a keyword. This allows for tags like
# "Family/Einstein/Albert", "Flora|Fruit|Orange", "Objects⊃Tools⊃Hammer", or
# "Fauna>Oceanic>Pelican". By default, these separators are the forward-slash,
# vertical-bar, and greater-than characters. If you don't want to interpret
# keywords as hierarchical, change this value to an empty string (""). After
# changing this value, you must force-resync your entire library for the
# changes to take affect.
# (env: "PS_KEYWORD_PATH_SEPARATORS")
#
keywordPathSeparators = "/|>⊃"

Yes, it addressed my concerns, thanks. I modified the setting to import only HierarchicalSubject but then I found some issues with my data.

Some of my keywords had commas in them. It’s impossible to set such keywords via Adobe Lightroom Classic UI, so it seems I set them via exiftool during some cleanup process. I am going to fix those commas in my library to stop confusing other parsers that split keywords with commas into multiple keywords.

Also I noticed that PhotoStructure adds unnecessary hierarchical tag if keyword has / . I am going to fix those keywords in my library as well

I’ve fixed my tags and executed ./photostructure sync --force --exit-when-done but still invalid tags are present in the library. Is there a way to sync keywords? Isn’t that should be default behavior of sync?

To be more specific, I edited my assets’ keywords and I expect my changes will be picked up by PhotoStructure's sync process

The use of --force should have rebuilt tags. I’ll try to reproduce this tomorrow.

Thanks, as always, for reporting this!

1 Like

I looked in settings.toml but did not see keywordPathSeparators… I’m using 0.9.1, do I need to add this setting manually? If so, is there a list of possible settings?

Update: installed 1.0 beta 9 and still didnt see the setting in the toml file.

Welcome to PhotoStructure, @FooderZ !

There’s actually two settings to check out: see https://photostructure.com/getting-started/advanced-settings/#-twice-the-fun-with-two-settings-files to see where and why.

You can also use environment variables if that’s more convenient:

https://photostructure.com/faq/environment-variables/

Thanks! Makes perfect sense.

If this should be a separate thread, let me know and I’ll create it, but in the keywordTags setting you show “Keywords”. My DAM (IdImager PhotoSupreme) writes each keyword to a tag named “Keyword (n)” where n is the number of the keyword. If there are 8 total keywords, You get Keyword (1) … Keyword (8) tags.

Does the Keyword entry in keywordTags pull information from numbered Keyword tags?

No: I’ve never seen that before! Know that you can support arbitrary keyword tags with the keywordTags library setting. It defaults to

      "CatalogSets",
      "Categories",
      "HierarchicalSubject",
      "Keywords",
      "LastKeywordXMP",
      "Subject",
      "TagsList",
      "XPKeywords"

Can you run exiftool -struct /path/to/tagged/image.jpg | grep -i keyword and DM me the results?

Emailed support with more information.

I moved my response to a new topic: Keyword parsing from an external DAM