Ratarmount in a docker on unraid

heavyd · September 12, 2021, 5:12pm

Feel free to improve on this as I’ve little experience deploying dockers via CLI. Or really, deploying dockers from anything other than the CA plugin in Unraid.

The challenge: I wanted a ‘python3’ environment from which to run ratarmount, so that I could mount google-takeout archives for photostructure import.
Related reading:

As noted in that writeup, make sure you grab the tgz from google takeout. The sizes are (can be) larger and ratarmount will consume them well (zip, maybe not supported yet)

I have a separate docker container to present the mounted contents of a series of google takeout tars. The trick is to specify the bind-mount such that the baseOS (or the Photostructure container) can see the files.

Ah: Enable dockerHub in unraid: (Apps tab–>CA Settings along the left nav)

Here’s the docker line that got me there (using the standard dockerHub ‘python’) and executed from an unraid console:

docker run -d \
  -it \
  --name <containerName> \
  --mount type=bind,source=<src-host>,target=<dst-container>,bind-propagation=rshared \
  --cap-add SYS_ADMIN --device /dev/fuse \
  python

Then exec in with bash:
docker exec -it <containerName-or-ID> bash

Install fuse tools
apt update; apt-get install python3-fusepy fuse

Install ratarmount
pip install ratarmount

Make your mountpoint, mine only as an example:

root@ae94450dd366:/duplicates# ratarmount *.tgz 20210908_takeout/takeoutmountpoint
Loading offset dictionary from /duplicates/takeout-20210908T180750Z-001.tgz.index.sqlite took 0.00s
Loading offset dictionary from /duplicates/takeout-20210908T180750Z-002.tgz.index.sqlite took 0.00s
Loading offset dictionary from /duplicates/takeout-20210908T180750Z-003.tgz.index.sqlite took 0.00s

Remember that whatever your rartarmount mount point, it’ll be read-only. So you may want to host-map the directory above, that way you can create the .uuid file for photostructure tracking.

Also note that it’ll take some time to initially create the sqlite files for each of your tars. For 110gig of takeout tar, this was about an hour for me. But the cool thing is that it only needs to be created once (thus, why it only took 0.00s in the example above)

Verify that you can see/copy the contents of your ratarmount point from the unraid console. If that looks good, you can have some confidence in the path-mapping to your photostructure container, same way you mapped to your regular photos.

Where the above approach falls short:
The above mount will only be alive while your bash session/console to the python container is up! Or certainly will be lost if the container is ever stopped. I need to understand the ‘Dockerfile’ stuff more or something to make this persist. Ultimately, I want photostructure to import json data (face tagging) from google takeout, as almost all of the photos themselves are already in my normal library. So if the mountpoint stays up long enough for that to happen then it works for me.

tkohhh · September 13, 2021, 3:09pm

I’m not a docker expert by any means, but have you tried restarting the container after installing fuse tools and ratarmount? I believe a restart will undo anything you installed directly into the container. If it survives a restart, I am pretty certain that it will NOT survive an update to the container.

I hope I’m wrong! But I’ve been bit before by installing things directly to a container, only to have them disappear on me.

heavyd · September 13, 2021, 3:34pm

You’re right; it absolutely reverts back to the original state. Very transitory. More work needs to be done to make it persist, or else script the retrieval of the fuse+ratarmount bits each time it fires up.

It’s not terribly painful, though; the sqlite files that are built for each of the massive tarballs are stored on the persistent mountpoint, outside of the container. So you only incur that build time once.

It’s still a bit of an experiment to see if it’ll meet the need of getting google-photo tag information into the photostructure library. I think it should. Photostructure sees the mountpoint but doesn’t see it as a volume for some reason. (I created the .uuid file but maybe there’s another piece to it)

I’m running a library rebuild for the last 24hours to see if that might help but so far I don’t think it has scanned the takeout directory.

I’ve verified that I can see the takeout files from photostructure by ‘console’-ing in to the PS container and poking around:

/ps/app # du -h --max-depth=1 /takeout
157G    /takeout/takeoutmountpoint
157G    /takeout
/ps/app # ls -lsa /takeout
total 52052811
       0 drwxr-xr-x 1 root root           85 Sep 12 16:12 .
       0 drwxr-xr-x 1 root root          160 Sep 12 16:00 ..
       4 -rw-r--r-- 1 root root           37 Sep 12 16:12 .uuid
52052807 dr-xr-xr-x 1 node users 53302072927 Sep  9 01:33 takeoutmountpoint
/ps/app # cat /takeout/.uuid
30e9a172-13e4-11ec-bc40-0242ac120009
/ps/app # cat /takeout/takeoutmountpoint/Takeout/Google\ Photos/Photos\ from\ 2021/IMG_2885.HEIC.json 
{
  "title": "IMG_2885.HEIC",
  "description": "",
  "imageViews": "5",
  "creationTime": {
    "timestamp": "1623111111",
    "formatted": "Jun 8, 2021, 12:11:51 AM UTC"
...
  "people": [{
    "name": "----"
  }, {
    "name": "----"
  }, {
    "name": "----"
  }],
...
}

tkohhh · September 13, 2021, 4:37pm

What’s the overall workflow here? Do I infer that you are relying on Google Photos to backup photos from your phone, as well as face tagging?

heavyd · September 13, 2021, 4:53pm

Yep, there’s a fair amount of upload history currently in google photos, mainly from cell phone cameras and during the years that the kids have been alive. So the face tagging is the main thing I’m looking to extract as I should already have the original assets in my normal archives. It’ll be a cool test for the photostructure dedupe as it should see the google-photos versions as ‘lesser’ assets but take advantage of the extra sidecar data.

mrm · September 14, 2021, 4:32am

@heavyd thanks for taking the time to write this up

It seems that the volume parser isn’t seeing your ratarmount folder. Can you send me ./photostructure info --volumes --debug?

If you find any deduplication errors, please send me an email with the variations so I can make it better!

heavyd · September 15, 2021, 3:28pm

Wanted to send an update after conversing with @mrm
Ultimately all that was needed to kickstart was:
./photostructure sync --force /takeout
executed from the /ps/app directory of the PS console. (more here)
I now see the processing effort in the UI.

One thing we might explore is that the TarMount doesn’t show in a ‘df’ output, which mrm advised me is parsed to query for volumes.

/ps/app # df -kP
Filesystem     1024-blocks       Used  Available Capacity Mounted on
/dev/loop2        26214400    9182120   16560920      36% /
tmpfs                65536          0      65536       0% /dev
tmpfs             65928716          0   65928716       0% /sys/fs/cgroup
shm                  65536          0      65536       0% /dev/shm
shfs            5858435620 1164493180 4693942440      20% /pics
tmpfs             65928716     100072   65828644       1% /ps/tmp
/dev/loop2        26214400    9182120   16560920      36% /etc/hosts
tmpfs             65928716          0   65928716       0% /proc/acpi
tmpfs             65928716          0   65928716       0% /sys/firmware

Ultimately, I don’t plan on depending much on my ‘takeout’; since I have the originals in their own location, I’m really only interested in the initial (or perhaps quarterly?) import of json/tag data. So while there’s a bit of manual effort here, it’s not unreasonable for my wants.

mrm · September 15, 2021, 4:35pm

Thanks a bunch for the updates!

I’ll look into how I can best get sshfs and other FUSE mountpoints to be visible to PhotoStructure soon.