Incorrect handling of keywords with comma

If you have a file with keyword with comma

exiftool image.jpg -HierarchicalSubject+="parent|hello, world" -HierarchicalSubject+="parent2|keyword2"

Then when you try to look at them

exiftool image.jpg -HierarchicalSubject
Hierarchical Subject            : parent|hello, world, parent2|keyword2`

And PhotoStructure incorrectly behaves likes there are three separate keywords parent|hello, world, parent2|keyword2

In order to address it accurately you should use -sep argument for exiftool

exiftool image.jpg -HierarchicalSubject -sep _MyVeryAwesomeSeparator_
Hierarchical Subject            : parent|hello, world_MyVeryAwesomeSeparator_parent2|keyword2

This trick might be useful not only for keywords but for any other EXIF tags that has list values

You can probably just remove the comma from the PS_KEYWORD_DELIMITERS setting.

# +---------------------+
# |  keywordDelimiters  |
# +---------------------+
#
# PhotoStructure splits apart keywords, by default, when they are delimited by
# a comma or semicolon. For example, "car, blue, tree" will be interpreted as
# having the keywords "car", "blue", and "tree". After changing this value,
# you must force-resync your library for the changes to take affect.
#
# PS_KEYWORD_DELIMITERS=",;"


# +-------------------------+
# |  keywordPathSeparators  |
# +-------------------------+
#
# PhotoStructure interprets keywords as hierarchical if a path separator
# character is found in a keyword. This allows for tags like
# "Family/Einstein/Albert", "Flora|Fruit|Orange", "Objects⊃Tools⊃Hammer", or
# "Fauna>Oceanic>Pelican". By default, these separators are the forward-slash,
# vertical-bar, and greater-than characters. If you don't want to interpret
# keywords as hierarchical, change this value to an empty string (""). After
# changing this value, you must force-resync your entire library for the
# changes to take affect.
#
# PS_KEYWORD_PATH_SEPARATORS="/|>⊃"
2 Likes

Oh man, thanks for that tip! :100:

I didn’t know about that switch! I’ll get that into the next build.

@avdp you’re correct!

I think it should be strictly better if I’ll change the default value for that setting to a more obscure Unicode character, and then feed the first character to -sep. Perhaps a variant of bar? https://en.wikipedia.org/wiki/Vertical_bar#Unicode_code_points

1 Like

I find that ~ (tilde) is usually a safer separator than comma without having to resort to exotic unicode characters. Not that there is anything wrong with unicode… it’s just that some of the variant look too similar to the standard |

1 Like

I looked at the code, and PhotoStructure (and ExifTool) are actually behaving properly: some software (I believe Windows Explorer and others) incorrectly encodes keyword lists using a comma, which is why I had to add those separators in the first place.

PhotoStructure v1.1.0 didn’t support “empty” values for settings, so setting keywordDelimiters="" would be ignored. I’ve fixed this in v2.0.0-alpha.1, and updated the settings documentation.

# +---------------------+
# |  keywordDelimiters  |
# +---------------------+
#
# PhotoStructure splits apart keywords, by default, when they are delimited by
# a comma or semicolon. For example, "car, blue, tree" will be interpreted as
# having the keywords "car", "blue", and "tree".
#
# Note that some software doesn't encode lists of keywords properly, so we
# have to include the comma and semicolon by default to handle these cases:
# but this makes keywords that contain a comma be split incorrectly. If the
# files in your library don't have this encoding issue, you can replace this
# setting with just an empty string to disable splitting.
#
# See
# <https://forum.photostructure.com/t/incorrect-handling-of-keywords-with-comma/992>
# for more discussion.
#
# After changing this value, you must force-resync your library for the
# changes to take affect.
#
# PS_KEYWORD_DELIMITERS=",;"

(Updated 2021-09-11 to support empty values: the prior solution using -sep caused parsing issues elsewhere, and this solution is less surprising)

Before I realized this was a setting in PS that could be changed, I gave up on using the “Last, First” convention on name tags. As a recovering European (born and raised) it was very hard to give that up. I thought about trying again since I hate name lists sorted by first name, but I figured it’s gonna cause other problems as well.