Hi,
I am having issues with the Arch Linux build environment provided by
builds.sr.ht. The problem is that during a build - at the same point
in the build process every time, it fails with the following message:
> Connection to localhost closed by remote host
An example build is https://builds.sr.ht/~ineptattech/job/927081.
Unfortunately I do not have a simple example to reproduce this issue. I
wonder the environment is low on memory and killing processes such as
sshd - in which case I can reduce the parallelism of the build. Or
perhaps there is another issue? This started happening recently when I
updated the version of an AUR package I am building. Note that the
package builds fine on my personal computer. In addition, I am able to
build with manifests for other packages with the Arch Linux image. So
this seems specific to the specific manifest that I linked to.
Thanks,
Tim
I could not really say what's going on here without a more minimal
reproduction. I would be surprised if it was OOM killing sshd, but that
does seem... somewhat plausible. sshd doesn't use a lot of RAM so I
don't think Linux would put it high up on the chopping block.
Could be that the whole VM was OOM killed on the host side, too, which
might happen in the case of high disk usage.
I may drop this as I am not sure what a useful minimum reproduction
would be. However, I am curious whether you can publish the memory and
cpu resources allocated to each build environment. I am interested in
this because the the number of cpu threads dictates the number of jobs
in my build. Perhaps this would be useful for others to know.
I have another ask. My understanding is that if the OOM killer starts
killing various processes, there is no logic way to set the environment
back to normal without restarting. With that said, would it be possible
to setup the sshd service to restart if it is killed? This would be
useful for debugging. Alternatively, could you provide some feedback in
the build failure message that indicates whether OOM killer interfered
with the build? Or maybe a more useful message would be a general
indication of whether there was some issue with the environment instead
of the build itself.
On Fri Jan 27, 2023 at 9:33 AM CET, Tim Lagnese wrote:
> I may drop this as I am not sure what a useful minimum reproduction> would be. However, I am curious whether you can publish the memory and> cpu resources allocated to each build environment. I am interested in> this because the the number of cpu threads dictates the number of jobs> in my build. Perhaps this would be useful for others to know.
You can easily ascertain these from within the build environment. These
specs are not set in stone, we may adjust them later, so it's best if
users are relying not on documented specs, but observed specs.
> I have another ask. My understanding is that if the OOM killer starts> killing various processes, there is no logic way to set the environment> back to normal without restarting. With that said, would it be possible> to setup the sshd service to restart if it is killed? This would be> useful for debugging. Alternatively, could you provide some feedback in> the build failure message that indicates whether OOM killer interfered> with the build? Or maybe a more useful message would be a general> indication of whether there was some issue with the environment instead> of the build itself.
The OOM killer prefers to kill processes using a large amount of memory
first, so it's unlikely that sshd was killed. Couldn't speculate further
until we have a reproduction to work with. And no, it's not really
feasible to determine if the OOM killer was responsible for a build
failure automatically.
I may have something that could provide additional information:
When connecting to the runner from Archlinux (OpenSSH version 9.1p1-3)
the connection is closed. The same thing also happens when using hut as
a wrapper. Full log attached.
Log:
Connected to build job #929530 (failed): https://builds.sr.ht/~poldi1405/job/929530
Your VM will be terminated 4 hours from now, or when you log out.
debug3: receive packet: type 96
debug2: channel 0: rcvd eof
debug2: channel 0: output open -> drain
debug2: channel 0: obuf empty
debug2: chan_shutdown_write: channel 0: (i0 o1 sock -1 wfd 5 efd 6 [write])
debug2: channel 0: output drain -> closed
debug3: receive packet: type 98
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug3: receive packet: type 98
debug1: client_input_channel_req: channel 0 rtype eow@openssh.com reply 0
debug2: channel 0: rcvd eow
debug2: chan_shutdown_read: channel 0: (i0 o3 sock -1 wfd 4 efd 6 [write])
debug2: channel 0: input open -> closed
debug3: receive packet: type 97
debug2: channel 0: rcvd close
debug3: channel 0: will not send data after close
debug2: channel 0: almost dead
debug2: channel 0: gc: notify user
debug2: channel 0: gc: user detached
debug2: channel 0: send close
debug3: send packet: type 97
debug2: channel 0: is dead
debug2: channel 0: garbage collecting
debug1: channel 0: free: client-session, nchannels 1
debug3: channel 0: status: The following connections are open:
#0 client-session (t4 r0 i3/0 o3/0 e[write]/0 fd -1/-1/6 sock -1 cc -1 io 0x00/0x00)
debug3: send packet: type 1
Connection to yui.runners.sr.ht closed.
Transferred: sent 2496, received 2888 bytes, in 1.0 seconds
Bytes per second: sent 2591.3, received 2998.3
debug1: Exit status 0
--
Moritz Poldrack
https://moritz.sh