Summary
The user is experiencing issues with their free tier satellite, where only 3 out of 9 builds succeeded due to network problems like EOF and connection timeouts. All four builds on Friday failed as well, taking around 2 hours to fail instead of the usual under 30 minutes. The user suspects resource exhaustion is causing the satellite to crash, as it maxes out on memory and disk IOPS. They received an email explaining that the exhaustion was due to multiple renovate PRs updating dependencies simultaneously, which overwhelmed the satellite. The user suggests implementing a "max concurrent builds" feature to queue builds and prevent resource exhaustion. They also express a desire for an automated system to detect and restart a "dead" satellite, noting that no new builds were initiated over the weekend.
mortenjo
I think maybe renovate can do that, but I haven't looked much at it. Started using it on friday, and am still tweaking config :slightly_smiling_face:.
brandon
Yeah I do wish we handled those things better but they haven’t been easy to do. I wonder if you can limit concurrency etc from your CI or renovate?
mortenjo
Yeah, I got a reply via email a little while ago saying the same thing. The exhaustion was a result of a "spam" of renovate PRs updating dependencies in a bunch of repos at the same time, all of which kicked of a build on the same small, free satellite. So I kind of understand how it got into the unhealthy state.
It would have been nice if there was some way to set "max concurrent builds" and queue builds instead of starting all at once and dying of resource exhaustion.
Also, no new builds were started during the weekend, it would have been nice if some automated system could detect a "dead" satellite and force restart it, but I guess that's complicated to do :thinking_face:
brandon
The free tier instances are pretty small, it might be worth trying a large instance in the paid tier or self-host a bigger one for free
brandon
Hey Morten, I think that satellite has been crashing due to resource exhaustion.
It looks like it maxes-out on memory, then uses swap so heavily that it then maxes out the allowed disk IOPS and becomes unresponsive
mortenjo
My (free tier) satellite seems to be unhealthy. Of the 9 builds started today, only 3 have succeeded, the rest have failed for what looks like various network issues (EOF, connection timed out, use of closed network connection). On friday, all four builds failed with same kinds of reasons. These are also spending around 2 hours before failing properly, in a build that normally takes less than 30 minutes to complete successfully (which is weirdly slow to begin with, takes less than half that on my laptop :shrug:). Anyone able to kick some tyres or something to get it healthy again? (org: mortenjo, satellite: ibidem)