Today I learned: A broken Collabora Container can cause arbitrary timeouts in Nextcloud.
I was working on my Nextcloud server, shifting some files back and forth, doing some basic maintenance, the usual stuff, you know? When suddenly I realized: Every few minutes (or even seconds!) would the page not load properly. It would just load and load for at least 20 to 30 seconds, before actually finishing. Sometimes I even needed to manually trigger a reload. At first, I was unsure, if this is a temporary issue, or something persistent, but after an hour, I was certain it was persistent and it was driving me nuts!
I was flabbergasted at first, as I had no idea what could be causing this. I started thinking about all the possible reasons outside my own domain: Network issues, my hosting provider having problems, my internet provider having problems or interfering somehow, the almighty spaghetti monster punishing me for something... The list kept growing.
Until I found a hint in the go-to troubleshooting solution of every seasoned admin: Log files! I was looking at my nextcloud.log
, chasing another issue, when I saw something along the lines of Failed to fetch the Collabora capabilities endpoint: cURL error 28: Operation timed out after 45000 milliseconds with 0 bytes received
. And that's when it came to me: "Check the darn Collabora container!", which I did. And would you know it? There it was, with a nasty error in its logs, which prevented Collabora from running, but did not cause the container to crash. Otherwise, I would have realized earlier, that something was off.
First I improved my monitoring, so I would be aware in the future, if Collabora was misbehaving again. Next I reverted the container to an older image, as my containers are automatically updated by Watchtower. After I started the container, I double-checked my Nextcloud and the functionality was restored as well as the timeouts were gone. I still have to understand what the problem with the current image is, but I am at least up and running again.
There are several takeaways here: