Everything You Ever Wanted To Know About PhotoStructure's Cache Directory

mrm · May 17, 2022, 4:57pm

What’s the “cache” or “tmp” directory?

This is where PhotoStructure can temporarily stores larger files.

Some other applications call this a “scratch” directory.

Where is this cache directory stored?

It’s different on every platform: PhotoStructure tries to be a “good citizen” for every OS and follow the standard for that OS.

On macOS, it lives in $HOME/Library/Caches/PhotoStructure. Note that the Library directory is hidden in the Finder: hit ⌘-shift-. (command-shift-period) to see it.
On Windows, it lives in %TEMP%, or %LOCALAPPDATA%, or $HOME/AppData/Local/PhotoStructure, whatever is defined first.
On Linux, it lives in $HOME/.cache/photostructure.

Can I override the default cache directory?

Certainly! Just set the PS_CACHE_DIR environment variable.

Read how to set environment variables for every supported OS here.

General guidelines for the cache directory

From the docker documentation:

If you’ve got an SSD, use that: this directory will see a lot of reads and writes.
If you can, pick (or create) a volume that doesn’t have data integrity protection, on-the-fly compression, or on-the-fly file de-duplication. This will make imports faster as well as reduce system load.
PhotoStructure automatically prunes old and unused files from the scratch directory: it shouldn’t take much space unless an import is running.

What lives in the cache dir?

Your cache dir:

may need space for your library database (which may be several hundred MB, depending on the size of your library), and
will need space for the “sync-state” database (this manages the sync work queue and requires local disk, and doesn’t get very big: 50MB tops), and
will need space for the non-JPEG assets that are imported by your system in 5 minutes. If you system is fast (say, 32 threads), you should see ~100-200 assets imported per minute. A very high resolution flagship dSLR/Mirrorless, say, 40MP sensor, may consume 5-10MB per file conversion (as I convert to TIFF). Assuming all your files are gigantic RAW, that’s 5 * 200 * 10MB, or 10GB. And then let’s do the 'ol multiply-by-three Principle Engineering Scheduling trick to be pessimistic.
will need space to hold converted images that have been zoomed in, whose original file format isn’t renderable by your current browser. PhotoStructure’s web service will do the conversion in the background, and store the result temporarily in the cache dir.

How big can it get?

Disk consumption depends on your library and your computer, but, worst-case, the cache dir shouldn’t exceed 32GB.

Does it have to be on local disk?

Yes: the SQLite databases require local disk. See this for details:

Can I delete stuff from the cache dir?

The SQLite dbs (the library replica, if present, and the “sync-state” db) must not be deleted while PhotoStructure is running, but everything else is a temporary file that can be deleted without undue harm.

PhotoStructure should be cleaning things up automatically. readdirCacheMs defaults to 5 minutes, as does the image cache. The next version adds a new imageCacheMs setting so you can control both of these cleanup “cronjobs”.

Manual cleanup

Note also that the info tool now can force-run these cleanup jobs (and some db maintenance jobs, like recounting tags and rebuilding search indexes): ./photostructure info --cleanup (although you really shouldn’t ever need to run that: it happens whenever sync is running).

tkohhh · May 17, 2022, 5:02pm

I think it would be helpful if you would define “cache dir” in this explanation so it’s clear exactly what you’re talking about.

pmocek · May 26, 2022, 9:25pm

@mrm: A “good citizen” on Linux would, in compliance with the Filesystem Hierarchy Standard, put temporary files in /tmp (or in /var/tmp if they must persist after reboot), and put application cache data in /var/cache/photostructure, no?

mrm · May 27, 2022, 2:38am

/tmp is certainly what I’d have gone with when I first started using Linux!

Current Linux apps seem to overwhelmingly write to ~/.cache, though, including inkscape, mozilla, gimp, pip, digikam, darktable, chrome, transmission, shotwell, rhythmbox, gnome utilities, and even git. I have more than 50 app directories in ~/.cache on my ubuntu dev box (!!)

If I had to guess, devs migrated away from /tmp due to privilege escalation issues, information leaks due to incorrectly set umasks, and readonly or missing /var/tmp/ partitions. You can count on $HOME existing (for the most part), though.

Know that the cache directory, along with all other PhotoStructure paths, are easily customizable. In this case, set the PS_CACHE_DIR environment variable to something you prefer (hopefully a subdir in /tmp or /var/tmp that is only writable by the application user running PhotoStructure). I’ll add this to the docs above.

pmocek · May 30, 2022, 4:13pm

That’s a bit disappointing. It felt like we all took a long time to get things into standard (not just de facto standard) locations, and doing so very much simplified software packaging and system administration.

For others interested in this topic: The XDG Base Directory Specification (XDGBDS) and Debian’s XDGBaseDirectorySpecification wiki page seem like good places to start.

This statement from Debian is reassuring, as I have long held in high esteem the Debian maintainers’ direction on matters of stability and standardization:

Debian does not require that packages conform to the XDGBDS but strongly encourages upstreams to do so.

From the spec:

Basics

The XDG Base Directory Specification is based on the following concepts:

There is a single base directory relative to which user-specific data files should be written. This directory is defined by the environment variable $XDG_DATA_HOME.

There is a single base directory relative to which user-specific configuration files should be written. This directory is defined by the environment variable $XDG_CONFIG_HOME.

There is a single base directory relative to which user-specific state data should be written. This directory is defined by the environment variable $XDG_STATE_HOME.

There is a single base directory relative to which user-specific executable files may be written.

There is a set of preference ordered base directories relative to which data files should be searched. This set of directories is defined by the environment variable $XDG_DATA_DIRS.

There is a set of preference ordered base directories relative to which configuration files should be searched. This set of directories is defined by the environment variable $XDG_CONFIG_DIRS.

There is a single base directory relative to which user-specific non-essential (cached) data should be written. This directory is defined by the environment variable $XDG_CACHE_HOME.

There is a single base directory relative to which user-specific runtime files and other file objects should be placed. This directory is defined by the environment variable $XDG_RUNTIME_DIR.

All paths set in these environment variables must be absolute. If an implementation encounters a relative path in any of these variables it should consider the path invalid and ignore it.

mrm · May 30, 2022, 4:22pm

Oh! I actually already delegate to $XDG_DATA_DIR and $XDG_CONFIG_DIR (first one set wins) for PS_CONFIG_DIR, if PS_CONFIG_DIR is not set.

I wasn’t aware of $XDG_CACHE_HOME! The next build on linux will first look at $PS_CACHE_DIR, then $XDG_CACHE_HOME, then the above heuristics (~/.cache).

(Docker’s logic is a bit more complicated: it uses PS_CACHE_DIR if set, then uses $XDG_CACHE_HOME if set, then /ps/tmp if it exists, then /ps/cache if it exists, then /ps/library/.photostructure/cache-$UID).

bdillahu · May 30, 2022, 8:48pm

Just to add to the discussion, the spec further says:

$XDG_CONFIG_HOME defines the base directory relative to which user-specific configuration files should be stored. If $XDG_CONFIG_HOME is either not set or empty, a default equal to $HOME /.config should be used.

Which would explain to me why most things are using $HOME/.config

pmocek · August 3, 2022, 10:02pm

Re-reading this, now that I’ve set up PhotoStructure several times and am more familiar with its operational requirements, I note that the XDG_* directories are explicitly intended for user-specific data. When I arrange to run a web application headless, I don’t expect much, if anything, that it writes to be specific to any one user of the machine where it runs.

I recently set up PhotoStructure to run in a Docker container, with config, logs, and library in NFS-mounted volumes, and when I needed to provide a cache dir from local storage, I reflexively made it /var/cache/photostructure on the host machine–because that is where cache files belong.

I remain convinced that storing anything not related to a specific user’s interactive work (i.e., not configuration of or data generated by a program like “inkscape, mozilla, gimp, pip, digikam, darktable, chrome, transmission, shotwell, rhythmbox, gnome utilities, and even git .”) in any user’s $HOME directory is awkward at best.