Sync report lists files as "enqueued" but doesn't import/process

reed · December 28, 2023, 2:08pm

I’m having trouble getting files into a library. Over Christmas break, I spent a few hours trying to get things set up the “right” way, but haven’t been entirely successful!

A few notes on the hardware & software setup that may or may not be relevant:

PhotoStructure is running in Docker on a Synology DS224+ NAS
Version 2023.12.0-alpha.7
I have a 4TB NAS-specific spinning HDD as well as a 512GB SSD. Originals are on the HDD, PhotoStructure is on the SSD following the instructions here.
I mapped two folders from the spinning HDD into PhotoStructure: a “Photos” folder and a “PhotoImport” folder. That was the best way I could come up with to copy files (mainly from my laptop) onto the NAS, and then subsequently have PhotoStructure reorganize the files. (Open to other suggestions!)

Now, after letting PhotoStructure run for several days:

There are ~50,000 originals in “PhotoImport” (~60,000 files including XMP sidecars) totaling 256GB
There are ~31,000 photos/videos that PhotoStructure has imported and copied to “Photos” (~44,000 files) totaling 159GB
Nothing new has processed for 2-3 days

There is nothing in the library/.photostructure/sync-reports folder now. But yesterday, there was a large 10MB CSV file there that I copied to my laptop. It lists at least one status for ~62,000 files - i.e., it looks like everything in the “PhotoImport” folder was scanned. Most files went enqueued -> started -> copied -> synced. It looks like the import / queue was tackled chronologically, and stopped about mid-2022.

Clicking “Restart sync” hasn’t had much effect. I intend to kick off a “Rebuild” and see if that sorts out the issue.

mrm · December 29, 2023, 3:21am

Apologies for the glitch!

I suspect that there are files that are getting “stuck.” Most file operations currently use the taskTimeoutMs setting, which defaults to 2 minutes.

What’s the longest time it can validly take to fetch a file SHA, generate image previews for an asset, compute an image hash, or extract tags for a file?

On a reasonable server and disk, these tasks should take at most a handful of seconds, but the default allows for a (busy!) RPi on a slow, remote disk to not timeout.

Set this to 0 to disable task timeouts

Unfortunately, several higher-level operations, like image preview generation, rendering raw images, and (especially) video transcodes, can validly take orders of magnitude different elapsed time to complete.

I initially estimated timeout values based on measurements I took on slower hardware, but those turned out to be wildly optimistic for some newer high computational expense codecs. A couple of builds ago we’d hit timeouts, retry the operation (which would double-run ffmpeg concurrently), sometimes grind into swap, and slow sync to a crawl. It’d finish eventually, but it wasn’t fun. In frustration, I deleted these higher level timeouts (transcodeTimeoutBps isn’t respected in the current build).

Over the holidays I realized I could do much higher accuracy timeout prediction by using both file size and pixel process count (where videos are durationMs × fps × pixelWidth × pixelHeight). The next build I’m adding a new least-squares estimation for process timeouts that will estimate timeout based on current hardware and storage, with a timeout set to 6 sigmas away from mean (where 6 is a setting that can be bumped up to an even higher number if necessary).

The next build will include these new timeouts: https://photostructure.com/about/2023-release-notes/#timeout-improvements

mrm · December 29, 2023, 5:18am

Rebuild really isn’t the best name–it’s more of a “validate what’s already imported into the library database” process.

reed · December 30, 2023, 2:09pm

Thanks for the quick response & for the diagnosis. I added an environment variable to the Docker container settings, PS_TASK_TIMEOUT_MILLIS = 0, and that seems to have allowed processing to continue.

Most of the files are JPGs but there are a handful of MOV files and RAW files mixed in. I didn’t see an obvious pointer to particular file that was causing problems. There were a lot of timeout errors in a log file I skimmed, as well as some “no such asset” errors.

{"ts":1703939463120,"l":"error","ctx":"sync-file.UpdateAsset(31947)","msg":".throw() no such asset","meta":{"stack":"Error: no such asset\n    at new g (/opt/photostructure/bin/sync.js:9:301762)\n    at t.toWrappedError (/opt/photostructure/bin/sync.js:9:299753)\n    at p.throw (/opt/photostructure/bin/sync.js:9:582701)\n    at /opt/photostru…structure/bin/sync.js:9:1078658)"}}
{"ts":1703940059291,"l":"warn","ctx":"async.Deferred(TIMEOUT)","msg":".reject()","meta":"Timeout TIMEOUT after 24000ms at Timeout._onTimeout (/opt/photostructure/bin/sync.js:9:138955); listOnTimeout (node:internal/timers:573:17); processTimers (node:internal/timers:514:7)"}
{"ts":1703941972872,"l":"warn","ctx":"fs.BaseFile(/ps/library/.photostructure/docker-config/pids/11125.json)","msg":"trap: readJson() failed: Error: ENOENT: no such file or directory, open '/ps/library/.photostructure/docker-config/pids/11125.json'"}

reed · December 31, 2023, 1:00pm

Setting this timeout variable seemed to allow a few assets to be processed, but my library still hovers at around 35,000 assets rather than ~50k.

I don’t see any sync-reports file. The latest sync.log file shows a bunch of timeout messages like this:

{"ts":1704027190690,"l":"warn","ctx":"async.Deferred(TIMEOUT)","msg":".reject()","meta":"Timeout TIMEOUT after 24001ms at Timeout._onTimeout (/opt/photostructure/bin/sync.js:9:138955); listOnTimeout (node:internal/timers:573:17); processTimers (node:internal/timers:514:7)"}

reed · December 31, 2023, 1:16pm

Also, I’m not sure if this is related, but for a lot (half? more?) of my photos, the preview thumbnail looks fine but clicking on it shows a little “?” Super Mario question mark box. Downloading the file works fine.

The web log shows:

{"ts":1704028693915,"l":"warn","ctx":"express.ServerToastHelpers","msg":"sendServerToast()","meta":{"requestedURI":{"$mid":1,"path":"/img/28256/69604/actual","scheme":"http","authority":"family.photos"},"toast":{"text":"Well, darn, that didn't work","uuid":"BatchCluster was given invalid options: taskTimeoutMillis must be greater than or equal to 10","details":"BatchCluster was given invalid options: taskTimeoutMillis must be greater than or equal to 10","type":"warning","button1Text":"Return home","button1Url":"/#dismiss","button1Class":"info","onClickUrl":"#dismiss"},"httpStatusCode":503}}
{"ts":1704028693916,"l":"warn","ctx":"web.RequestLogger","msg":"GET http://family.photos/img/28256/69604/actual HTTP/1.1","meta":{"from_ip":"::ffff:172.17.0.1","protocol":"HTTP/1.1","method":"GET","statusCode":503,"ok":false,"elapsedMs":22,"runState":"ready","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15"}}

mrm · December 31, 2023, 6:35pm

Ah! Batch cluster doesn’t know how to deal with zero for taskTimeoutMillis. I’ll fix that in the next release, but until then, revert that seeing to default.