Pictures import hangs on arm64

Hello hello… I am new here and I am testing photostructure (alpha-7) on a plus trial license on an Odroid HC4 (aarch64) with armbian (debian 11.7), two external HDDs in BTRFS RAID1.
The problem here is that I was never able to see a finished import (I have quite a big library); I played a bit with timeouts as the health checks were reporting errors while checking the DB from time to time. After some time the import hangs indefinitely and the sync process has a single zombie child (see below).
I would like to try to help solving these issues as the number of appliances on arm processors are more and more.

photost+ 1552050  0.0  0.0   6548   680 ?        Ss    2023   0:00 /bin/bash -e /media/myCloudDrive/home/photostructure/photostructure-for-servers/start.sh --expose
photost+ 1552121  0.0  0.2 734980 10880 ?        Sl    2023   0:04  \_ node ./photostructure --expose
photost+ 1552133  0.2  0.6 1065456 26116 ?       Sl    2023   8:50      \_ phstr-main
photost+ 1552150  1.5  4.9 1481640 191364 ?      Sl    2023  62:13          \_ phstr-web
photost+ 1684556  6.4  5.0 1423748 194220 ?      SNl  Jan01 113:33          \_ phstr-sync
photost+ 1684582  0.0  0.0      0     0 ?        ZN   Jan01   0:00              \_ [sh] <defunct>

Welcome to PhotoStructure, @guandalf !

Apologies for the hangup (I just extended your free trial).

Oh nice! I bought an HC1 ages ago, but that proved a bit too meager to do much more than running pihole. It’d be great to get PhotoStructure running on an HC4.

Oof, this really shouldn’t happen. If you see that in the future, please send me the text of the error so I can look into it.

I wish there was a bit more to go on there–like if we had the PID we could check to see what flavor subprocess became zombified.

A couple things may help us figure out what’s going on here:

  1. Your sync reports should have some records that have start entries but no corresponding completion/timeout entries.

  2. If you enable debug logging, there may be clues there, but it can be tough to tease out clues from concurrent process logs, unless they’re outright errors/exceptions (which could be the case, given your platform may not behaving as expected).

  3. The next build will add timeouts to several additional components. Timeouts, to be clear, are not proper solutions, but they can help prevent an error from a corner case from causing system wedging.

No apologies but thank you for the extended trial, I’ll keep on testing until it works :slight_smile:

I think I can easily reproduce setting back everything to default values. But even without reverting the settings, after activating the debug logging and restarting the systemctl service I got:

Unfortunately the sync-reports folder is empty:

photostructure@nextcloudpi:~$ ls -la PhotoStructure/.photostructure/sync-reports/
total 4
drwxr-xr-x 1 photostructure photostructure   62 Jan  2 18:39 .
drwxr-xr-x 1 photostructure photostructure  194 Jan  2 20:09 ..
-rw-r--r-- 1 photostructure photostructure    0 Jan  2 21:02 .metadata_never_index
-rw-r--r-- 1 photostructure photostructure 3091 Jan  2 16:56 README.txt
photostructure@nextcloudpi:~$

Activated debug logs. Now I’m going to wait for the sync to hang and see what’s reported there. In the meantime, in the web log I see several of:

{"ts":1704226709562,"l":"warn","ctx":"db.DbRetries","msg":"db.integrityCheck: Caught db error. Retrying in 7800ms.","meta":{"name":"db.integrityCheck","error":"integrity_check failed for /media/myCloudDrive/home/…/PhotoStructure/.photostructure/models/db.sqlite3: \"unable to validate the inverted index for FTS5 table main.tag_fts: database is locked\" at new g (/media/myCloudDrive/home/photostructure/photostructure-for-servers/bin/web.js:9:298161); t.toWrappedError (/media/myCloudDrive/home/photostructure/photostructure-for-servers/bin/web.js:9:296152); p.throw (/media/myCloudDrive/home/photostructure/photostructure-for-servers/bin/web.js:9:606085); R (/media/myCloudDrive/home/photostructure/photostructure-for-servers/bin/web.js:9:1149543); B (/media/myCloudDrive/home/photostructure/photostructure-for-servers/bin/web.js:9:1149743); /media/myCloudDrive/home/photostructure/photostructure-for-servers/bin/web.js:9:1150431; /media/myCloudDrive/home/photostructure/photostructure-for-servers/bin/web.js:9:253504"}}
{"ts":1704226710667,"l":"error","ctx":"db.SQLite","msg":".throw() integrity_check failed for /media/myCloudDrive/home/…/PhotoStructure/.photostructure/models/db.sqlite3: \"unable to validate the inverted index for FTS5 table main.tag_fts: database is locked\"","meta":{"stack":"Error: integrity_check failed for /media/myCloudDrive/home/…/PhotoStructure/.photostructure/models/db.sqlite3: \"unable to validate the inverted index for FTS5 table main.tag_fts: database is locked\"\n    at new g (/media/myC…for-servers/bin/web.js:9:154382)"}}

Waiting for it.

Thank you very much for your reply. Look forward to solve all issues and be able to use your fantastic piece of software on my device.

Adding a few more details to the investigation.

I activated the debug logging and restarted the process. It seemed that the import was finally working (?!?) when unfortunately I had a power outage and it could not finish successfully.
I restarted everything and now the sync hanged again after scanning the library and a small part of one of the three folders I manually selected to be scanned.

I could not find anything I could recognize as useful in the log of the sync process, but if you want I can send it to you for further investigation. LMK if I can send anything else…

So, after a couple of days the situation is as follows: import is stale after importing a lot more of pictures from the biggest folder of pictures.

The processes are like:

photost+   80030  0.0  0.0   6548   816 ?        Ss   Jan04   0:00 /bin/bash -e /media/myCloudDrive/home/photostructure/photostructure-for-servers/start.sh --expose
photost+   80100  0.0  0.3 734980 14276 ?        Sl   Jan04   0:03  \_ node ./photostructure --expose
photost+   80113  0.2  1.4 1080220 54984 ?       Sl   Jan04   8:07      \_ phstr-main
photost+   80130  2.8  7.5 1552980 293116 ?      Sl   Jan04  84:26          \_ phstr-web
photost+  284920  0.4  0.4  20328 17092 ?        SN   18:00   0:00          |   \_ /usr/bin/perl -w /media/myCloudDrive/home/photostructure/photostructure-for-servers/node_modules/exiftool-vendored.pl/bin/exiftool -stay_open True -@ -
photost+   80160 27.3 11.4 1578548 441916 ?      SNl  Jan04 809:53          \_ phstr-sync
photost+   80186  0.0  0.0      0     0 ?        Z    Jan04   0:00              \_ [sh] <defunct>
photost+  277960  0.9  4.3 1307796 169352 ?      SNl  09:23   4:57              \_ phstr-worker
photost+  277972  0.8  6.3 1292412 246056 ?      SNl  09:23   4:18              \_ phstr-worker
photost+  278101  0.0  1.0  44080 40504 ?        SN   09:26   0:07              \_ /usr/bin/perl -w /media/myCloudDrive/home/photostructure/photostructure-for-servers/node_modules/exiftool-vendored.pl/bin/exiftool -stay_open True -@ -
photost+  278111  0.0  0.7  31276 28040 ?        SN   09:26   0:03              \_ /usr/bin/perl -w /media/myCloudDrive/home/photostructure/photostructure-for-servers/node_modules/exiftool-vendored.pl/bin/exiftool -stay_open True -@ -

There is a csv file in the sync reports folder but I seem to see no obvious error there, nor in the debug logs.

How to proceed from here?

Thank you again.

I’ve just added a --todo switch to the list tool in the next release, so you’ll be able to keep tabs on what sync is supposed to be working on. Perhaps it will provide more clues as to what’s going awry.

Until then, you can “force sync” from the nav menu, which will delete the prior work queues and force the sync process to re-walk your scan directories. This isn’t a solution, of course–just a remediation–sync should “just work” and do it’s job reliably in the background…

We’ll get there…

After a month of testing I have to say that the import stalls almost every time. I have to restart it manually and I got to the point of having it successfully completed once or twice but after some days the result is always leading to a stalled imports, recoverable only manually.

:cry: