No sync on Synology Docker

Hello again @mrm

I am currently running PS latest prealpha (.17) on my DS920+ and no sync is happening, logtail is very quiet no matter what I try.

I am using the following environment vars:

PUID=1030
PGID=100
TZ=Europe/London
PS_LIBRARY_DIR=/ps/library
PS_COPY_ASSETS_TO_LIBRARY=0
PS_SCAN_ALL_DRIVES=1
PS_FORCE_LOCAL_DB_REPLICA=0
PS_PROCESS_PRIORITY=BelowNormal
PS_CPU_LOAD_PERCENT=100
PS_FFMPEG_THREADS=3
PS_MAX_CONCURRENT_IMPORTS=3
PS_SHARP_THREADS_PER_PROCESS=2
PS_LOG_LEVEL=info
PS_TAIL_LOGS=1

There are two mountpaths bound to the container:

  • /ps/library > /volume1/docker/photostructure/library
    -/Archive > /volume1/Archive

I created a docker-user (id: 1030) on DSM which has access to the above mount paths.

I verified these mount paths are R/W from within a container terminal session with:

sudo docker exec -u photostructure -it photostructure-server-prealpha bash

I have tried rebuilding, restarting sync, etc. but nothing appears to happen except for a quick “Syncing” UI message on /ps/library which only contains previews, and no action whatsoever on the mount path that I actually need synced /Archive

Something that sticks out in the logs is:

2023-10-08T12:53:04.553Z sync info_ event.SharedState(/ps/library/.photostructure/cache-1030/shared-state)#783 save() { state:
   { pid: 783,
     paused: false,
     syncing: true,
     events: [ { name: 'updateProgress', args: [], uid: '1jd7pkce8-8dx25zj' } ],
     tasks: [] } }
2023-10-08T12:53:04.681Z sync warn_ Error onError() { event: 'nonFatal',
  error:
   'g: unhandledRejection: invalid clamp(256, 125, 100) at E (/opt/photostructure/bin/sync.js:14:24659); t.isTooBusy (/opt/photostructure/bin/sync.js:9:940173); a (/opt/photostructure/bin/sync.js:9:941886); t.doNotRun (/opt/pho…ostructure/bin/sync.js:9:140744)' }

Otherwise I’m 100% at a loss of what else I should try…

2.1-alpha.7 was somewhat syncing, but needed the occasional restart, leaving a lot of holes even after days being left to itself.

-prealpha.17 sits there doing apparently nothing.

The /Archive folder is a mixed-content folder. No just photos. I used .NoMedia extensively to prevent it from scanning non-applicable subdirs.

Small update, still no joy. I tried starting afresh with a blank library.

All I get when I attach to PS’s Docker is an endless stream of:

2023-10-08T14:01:49.985Z sync warn_ health.HealthCheck.system-load #toErrorResult { result:
   { section: 'System',
     id: 'system-load',
     ts: 1696773709985,
     level: 'warn',
     elapsedMs: 1,
     msg: [ 'Checking system load failed:', 'invalid clamp(256, 125, 75)' ] } }
2023-10-08T14:01:51.529Z sync warn_ Error onError() { event: 'nonFatal',
  error:
   'g: unhandledRejection: invalid clamp(256, 125, 75) at E (/opt/photostructure/bin/sync.js:14:24659); t.isTooBusy (/opt/photostructure/bin/sync.js:9:940173); a (/opt/photostructure/bin/sync.js:9:941886); t.doNotRun (/opt/phot…ostructure/bin/sync.js:9:140744)' }
2023-10-08T14:01:51.533Z main-27 warn_ ChildService(sync) .onStderr() 'This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason:\nError: invalid clamp(256, 125,…t (node:internal/timers:573:17)\n'
2023-10-08T14:01:53.469Z sync warn_ Error onError() { event: 'nonFatal',
  error:
   'uncaughtException: invalid clamp(256, 125, 75) at E (/opt/photostructure/bin/sync.js:14:24659); t.isTooBusy (/opt/photostructure/bin/sync.js:9:940173); f (/opt/photostructure/bin/sync.js:14:80933); H.c (/opt/photostructure/…ers (node:internal/timers:514:7)' }
2023-10-08T14:01:53.482Z sync warn_ Error onError() { event: 'nonFatal',
  error:
   'g: sentry.onFatalError: invalid clamp(256, 125, 75) at E (/opt/photostructure/bin/sync.js:14:24659); t.isTooBusy (/opt/photostructure/bin/sync.js:9:940173); f (/opt/photostructure/bin/sync.js:14:80933); H.c (/opt/photostruc…ers (node:internal/timers:514:7)' }
2023-10-08T14:01:53.586Z main-27 warn_ ChildService(sync) .onStderr() 'This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason:\nError: invalid clamp(256, 125,…t (node:internal/timers:573:17)\n'
2023-10-08T14:01:53.601Z main-27 warn_ ChildService(sync) .onStderr() 'Error: invalid clamp(256, 125, 75)\n    at E (/opt/photostructure/bin/sync.js:14:24659)\n    at t.isTooBusy (/opt/photostructure/bin/sync.js:9:940173)\n    at f (/opt/photostructure/bin/sync.js:14:80933)\n    at H.c (/opt/photo…rs (node:internal/timers:514:7)\n'
2023-10-08T14:01:53.614Z main-27 warn_ ChildService(sync) .onStderr() 'Error: invalid clamp(256, 125, 75)\n    at E (/opt/photostructure/bin/sync.js:14:24659)\n    at t.isTooBusy (/opt/photostructure/bin/sync.js:9:940173)\n    at f (/opt/photostructure/bin/sync.js:14:80933)\n    at H.c (/opt/photo…rs (node:internal/timers:514:7)\n'

Could I possibly be the only person running this on Docker on DSM 7.2 on a DS920+ NAS?
Is the error above the show-stopper of the Sync process?

What’s weird is that I hear the drives of the Syno crunching away while the process is active, but all there is is just this log. No previews are generated, nothing gets indexed.

Help!

Oof, sorry for these glitches!

Thanks to your log debugging, it looks like isTooBusy is failing. Here’s the current implementation:

export function isTooBusy(): boolean {
  const busyPct = CpuUsage.instance().busyPct()
  // If people set cpuLoadPercent too low, this results in PhotoStructure
  // never running. Only set this if the setting is reasonable.

  // min use would be 50% of one cpu, so, 100 / (2 * cpuCount):
  const reasonableMaxCpuPct = clamp(
    Math.round(100 / (2 * cpuCount())),
    125,
    Math.round(Settings.cpuBusyPercent.valueOrDefault)
  )
  if (gte(busyPct, reasonableMaxCpuPct)) {
    logTooBusy()
    return true
  } else {
    logTooBusy.unset() // < log the next time we're too high
    return false
  }
}

Somehow the cpuCount() for your machine is returning a fractional value: 100/512.

Here’s cpuCount:


export const cpuCount = lazy(() => {
  return min([sysFsCpu(), availableParallelism(), cpuInfo().length]) ?? 1 // < hope we never need that fallback!
}, minuteMs)

So one of those implementations is returning the fraction. I should Math.ceil the result, in any event–everything assumes there’s at least 1 cpu, and that CPU count isn’t fractional.

I’ll get this fix out today if possible. Thank you for your assistance here, and your patience!

Cheers.

1 Like

Hello @mrm, thank you for the reassuring response.

Just wanted to make sure I understand: next step is that I wait for you to release a fix, and that fix will show up in an updated -prealpha Docker image at some point in the future that I can update to? Or should I switch to a different tag?

Yes. Unfortunately there’s no setting I can tell you to set in order to avoid this bug, so you’ll have to wait for the next build. I’ll try to have it out today.

Thanks for your patience!

Thank you @mrm

In case it is of any help, here is an lscpu -J spat out by the Docker container in my Synology DS920+ running DSM7.2 latest.

{
   "lscpu": [
      {
         "field": "Architecture:",
         "data": "x86_64",
         "children": [
            {
               "field": "CPU op-mode(s):",
               "data": "32-bit, 64-bit"
            },{
               "field": "Address sizes:",
               "data": "39 bits physical, 48 bits virtual"
            },{
               "field": "Byte Order:",
               "data": "Little Endian"
            }
         ]
      },{
         "field": "CPU(s):",
         "data": "4",
         "children": [
            {
               "field": "On-line CPU(s) list:",
               "data": "0-3"
            }
         ]
      },{
         "field": "Vendor ID:",
         "data": "GenuineIntel",
         "children": [
            {
               "field": "Model name:",
               "data": "Intel(R) Celeron(R) J4125 CPU @ 2.00GHz",
               "children": [
                  {
                     "field": "CPU family:",
                     "data": "6"
                  },{
                     "field": "Model:",
                     "data": "122"
                  },{
                     "field": "Thread(s) per core:",
                     "data": "1"
                  },{
                     "field": "Core(s) per socket:",
                     "data": "4"
                  },{
                     "field": "Socket(s):",
                     "data": "1"
                  },{
                     "field": "Stepping:",
                     "data": "8"
                  },{
                     "field": "Frequency boost:",
                     "data": "enabled"
                  },{
                     "field": "CPU(s) scaling MHz:",
                     "data": "124%"
                  },{
                     "field": "CPU max MHz:",
                     "data": "2001.0000"
                  },{
                     "field": "CPU min MHz:",
                     "data": "800.0000"
                  },{
                     "field": "BogoMIPS:",
                     "data": "3994.39"
                  },{
                     "field": "Flags:",
                     "data": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch intel_pt ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust smep erms mpx rdseed smap clflushopt sha_ni xsaveopt xsavec xgetbv1 dtherm ida arat pln pts rdpid md_clear arch_capabilities"
                  }
               ]
            }
         ]
      },{
         "field": "Virtualization features:",
         "data": null,
         "children": [
            {
               "field": "Virtualization:",
               "data": "VT-x"
            }
         ]
      },{
         "field": "Caches (sum of all):",
         "data": null,
         "children": [
            {
               "field": "L1d:",
               "data": "96 KiB (4 instances)"
            },{
               "field": "L1i:",
               "data": "128 KiB (4 instances)"
            },{
               "field": "L2:",
               "data": "4 MiB (1 instance)"
            }
         ]
      },{
         "field": "Vulnerabilities:",
         "data": null,
         "children": [
            {
               "field": "Itlb multihit:",
               "data": "Not affected"
            },{
               "field": "L1tf:",
               "data": "Not affected"
            },{
               "field": "Mds:",
               "data": "Not affected"
            },{
               "field": "Meltdown:",
               "data": "Not affected"
            },{
               "field": "Spec store bypass:",
               "data": "Vulnerable"
            },{
               "field": "Spectre v1:",
               "data": "Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers"
            },{
               "field": "Spectre v2:",
               "data": "Vulnerable, IBPB: disabled, STIBP: disabled"
            },{
               "field": "Srbds:",
               "data": "Not affected"
            },{
               "field": "Tsx async abort:",
               "data": "Not affected"
            }
         ]
      }
   ]
}

I checked my synology, and it seems that the value in /sys/fs/cgroup/cpu/cpu.shares is 1024, which I believe I was interpreting incorrectly (hence the fractional value for your cpu count).

I’ll get this fixed.

photostructure@photostructure-server-prealpha:/opt/photostructure$ grep '^' /sys/fs/cgroup/cpu/cpu.*
50
photostructure@photostructure-server-prealpha:/opt/photostructure$ node -e 'console.log(require("os").availableParallelism())'
4
photostructure@photostructure-server-prealpha:/opt/photostructure$ node -e 'console.log(require("os").cpus().length)'
4

If it’s a hairy situation and you want direct access to the host… let me know and I can arrange somehow. My forum email is the same as the PS licensed one.

P.S> I’m off to bed, GMT over here :slight_smile:

The value of 50 explains the fractional CPU count value! Thanks for the assist!

should PS simply ignore cpu.shares?
CPU Shares for Docker containers Says it’s a relative value.

FYI I did not configure container CPU priority or anything at all, all is left to Sinology’s defaults

Edit: I tried starting the PS container after ticking “Enable resource limitation” in the DSM Container Manager and dragging the CPU priority slider to Max:

cat /sys/fs/cgroup/cpu/cpu.shares
90

My Container Manager is version 20.10.23-1437

This is exactly the conclusion I’ve just come to as well, after reading the proper Linux cgroups docs, as the docker container may not have exposure to sibling cgroups, so whatever that value is can’t reliably be interpreted.

cgroups is a very very odd design.

Anyway, I feel a bit validated by this decision as this was the conclusion that the JRE came to as well.

Thanks again for your help!

You’re welcome, glad I could help a little.

Meanwhile… cheeky question: can I haz docker image? 1/2 jk. I am realy excited to throw PS at my archive directory and see the forgotten horrors that await in there.

I’m on it, but I still have a couple failing specs before I can release the next build.

Hop into the AlphaBeta discord where I give blow-by-blow updates about updates.

Hey hey @mrm, I as was impatiently waiting for the upcoming build, last night I tried to spin up PS on the very same DS920+, but this time I did it via docker-compose on the NAS’s CLI. I did not configure anything related to CPU priority, just the essential PS env vars for library, guid, etc.

Lo and behold, it ran just fine and it still is.

Once the container was created via docker-compose, it also appeared in the Synology GUI, albeit missing the bind mounts. So out of curiosity I added the missing bind mounts from the GUI and tried launching it from there, which led once more to the cpu shares error we knew and loved.

So, my unscientific conclusion: the Synology Docker GUI and whetever it does behind the scenes, must be initialising containers with some wierd flags that likely lead to the unexpected cpu shares value you discovered.

The next thing I wanted to try was to launch via CLI, but this time passing those newfnagled flags for the container’s cpu priority to understand if it’s a Synology issue or a Docker one.

Anyway, I suppose this update will hopefully be DoA once you ship the updated build.

Thanks for the update!

After I get this release out the door I’ll have time to update my Synology (it’s still on v6) and see what shenanigans they’re doing with their docker containers.