PhotoStructure sync reports

mrm · July 22, 2022, 11:47pm

What’s a “sync report”?

PhotoStructure v2.1+ assembles “sync reports” that you can manually review to know exactly what the sync process did as it walked through your scan paths and examined your filesystem for photos and videos.

These reports are in CSV format and can be opened in any spreadsheet application, like LibreOffice, Excel, or Google Sheets.

Where are these sync reports?

Sync reports can be found in the .photostructure/sync-reports sub-directory of your PhotoStructure library.

Instructions for how to see this directory on macOS, Windows and Linux are here.

Rows

As files and directories work their way through the processing pipeline, PhotoStructure appends new rows to the sync report CSV.

Columns

The ts column is the timestamp for the row, in milliseconds from 1970-01-01. Most spreadsheet applications don’t know how to parse these values, though, so we also add the at column.
The at column is ts in ISO format with only second resolution, and should be recognized as a date by most spreadsheet software.
The path column is the native path of the directory or file.
The state column explains why that row was added.
The from column specifies which code path added the sync report row.
The elapsedMs column is only added to rows completing a given path, and records how long that process took.
The details column will include information about the path, like why a given file or folder were rejected.
The url column is only added to rows when a file or directory is imported. You may need to adjust the domain name of the URL to make it work correctly (it defaults to localhost).

`state` values

For directories

The state column for directories will mostly be:

scanning: the directory contents are about to be read.
skipped: the directory was excluded. The details column will explain why.
scanned: the directory contents were completely processed.

If something goes awry, you may see:

failed: reading the directory contents failed.
timeout: reading the directory contents took too long.
canceled: PhotoStructure was shut down before the directory was processed.

For files

The state column for files will be

enqueued: the file looks promising, and will be attempted to be imported soon.
rejected: the file did not pass all import filters. The details column will explain why.
started: the file was dequeued from the work queue, and is now going to be processed.
noop: the current file metadata already matches your library database, so no operation was needed to sync this file.
deleted: the file was determined to be deleted (the prior mountpoint exists, but the file doesn’t exist anymore)
skipped: the file lives on a volume that isn’t currently mounted.
synced: the file was imported.
copied: automatic organization is enabled (copyAssetsToLibrary=true), and the photo or video was copied into your library originals directory. The details column will contain the source file path.
note: sidecars will be referenced here. The details column will specify which source file(s) it will be associated with.

If something goes awry, you may see:

failed: something went wrong. The details column will explain why.
timeout: the file wasn’t processed in a reasonable amount of time.

This post describes the reports somewhat like Linux’s /proc pseudo filesystem, as if they are an interface to PhotoStructure’s inner workings, in existence any time the system is in operation, and accurate at any time they are read. That would be great! However, I observe on my installation of PhotoStructure that they are normal files on a normal filesystem, and thus I know that they are not “live updated” as I described, providing a window into current operations at any time they are read.

I initially assumed that they would be generated and written to the specified location at the time that a sync run completes, but upon re-reading this post, I found that they will contain information that would seem to change during the course of a sync run, leaving me confused about just what is to happen.

mrm · August 16, 2022, 9:28pm

Good question!

I initially thought I’d have a single row for every file, and update the row as it flows through the processing pipeline, but I realized fairly quickly that I wanted to simplify the code writing the sync report as much as possible (I didn’t want to spend time debugging a tool that I wanted to use to help debug sync!).

The only retained state is what file is currently being appended to. Sync reports are only appended to. No previously-emitted rows are ever edited, in other words.

Another way to say this is that the “state” of any path is the contents of the last row for that path.

If you need more details about any given path that has been imported, you can look at the AssetFile database table, either directly, or via the list tool.

You can actually tail the sync report and watch it do work in real time.

I just updated the top post to clarify this.

pmocek · August 23, 2022, 10:03pm

Thanks for the clarification. Given such, I think these might be more clearly identified as log files than as reports.