The only time that’s reliably encoded in UTC is the GPS Date/Time
tag (because GPS is always UTC/GMT).
The encoded time for images, at least for the vast majority of cameras that I’ve looked at–but there are almost certainly exceptions–is in reference to the local timezone.
Videos, unfortunately, are remarkably inconsistently encoded, and almost never have GPS tags, which help guide a “ground truth” into backing into the correct TZ.
What’s the mimetype of the file we’re looking at here?
Leap seconds + fractional-hour TZ offsets =