Scanning never completes

I have tested out PhotoStructure since 1.0.0-beta both for Windows and Docker Compose (not at the same time, I’m currently trying to get it to work with Docker Compose and version 2.0.0-beta.1), but I have so far never been able to get PS to completely scan my photo collection. I have tried rescanning, rebuilding and completely reinstalling PS.

I have turned off backup software for all of PS’ directories (in the beginning I thought the backup software was the reason that I couldn’t get PS to work reliably).

It seems to start off ok when I start with a clean database, but after a day or two it either stops or moves extremely slowly. When running PS in docker compose, it sometimes just deletes the db.sqlite3 file and starts over without any warning or explanation.

At the moment I have lots of errors like this:
2021-11-07T12:05:22.370Z web-5175 error DbRetries Caught db error. Retrying in 4844ms. 'SqliteError: SQLITE_BUSY: database is locked

I have consistently through the different versions of PS had errors in the logs that it is unable to write to database or that the database is locked, but not necessarily the same error message as above.

It seems for some reason PS is not able to write to the database, but I don’t understand why.

Do you have any suggestions on how to move forward?

Apologies! I know how frustrating this is.

I’m finishing up a large update to PhotoStructure that completely changes how library imports work.

what we’ve got now

Currently, sync is in charge of directory iteration, and then spawns N sync-file processes to actually import files into your library.

https://photostructure.com/server/photostructure-for-servers/#service-architecture

why that’s problematic

The issue with this approach is that each process is reading and writing to your library database. SQLite’s ability to handle concurrent writes drops precipitously as the size of the database increases (especially when disk I/O is slow), so progress eventually stalls completely in SQLITE_BUSY errors and retries.

how does the new stuff work?

The new approach moves database janitorial work from main to sync, and does away with sync-file sub-processes completely, so only sync and web read and write to your library database. File imports are done within sync, with the majority of non-database work offloaded to worker_threads.

why the original design?

The original design was actually predicated on “not getting stuck,” which can happen if a file is corrupt in such a way that it causes one of the native libraries that PhotoStructure uses to wedge or kill the process, but it seems like the recent worker_thread implementation gives us the same process isolation.

when’s this going to be ready?

My development branch works for smaller libraries on Linux, but I’m still charging down a couple issues on other platforms, and then need to performance test with larger libraries. I hope to release a new alpha branch in a couple of days.

My setup is that I have a small and fast SSD (C:) and a large, slow HDD (D:).

From what you say, it sounds like it would help if I moved the library directory (.photostructure), which contains the db.sqlite3 file, to the SSD drive, and just leave originalsDir on the large HDD. I already have cache and other system files on the SSD.

Do you agree?

Or should I just wait for your alpha build being ready?

1 Like

PhotoStructure certainly supports hybrid library setups, which may help speed up browsing, but I suspect you’ll still bonk against SQLITE_BUSY issues unless you single-thread your imports and extend timeouts (which is decidedly not a reasonable solution, but it lets you limp to completion):

  • maxSyncFileJobs=1
  • dbTimeoutMs=5000

Thanks!

Just to clarify, I have never used Windows and Docker at the same time, I have always started completely from scratch when I have switched between Windows and docker compose.

I have just had similar issues with locked database on both versions of PS, so I thought I’d mention both in the question.

(Edit: I thought “hybrid solution” meant using two variants of PS on the same database. I now realised “hybrid solution” means exactly my setup with the library spread across several harddrives)