~sircmpwn/sr.ht-discuss

2 2

Half of my build jobs are stuck in pending state

Details
Message ID
<a89cc6d4-1771-d8be-23c9-4246fd039bb7@open-music-kontrollers.ch>
DKIM signature
pass
Download raw message
This is on a self-maintained sr.ht installation. Build worker is on a separate 
machine directly connected to the master redis and postgresql inside a VPN.

Newest versions on an alpine-3.17
- meta.sr.ht-0.64.8
- builds.sr.ht-0.86.1
- git.sr.ht-0.83.9
- lists.sr.ht-0.57.7

When I push to git, build jobs always show up correctly in the DB and the GUI, 
but half of them never exit their 'pending' state.

All the stuck builds show an error from lists.sr.ht. Maybe it has something to 
do with this? But I cannot figure out why lists should be involved when I push 
commits via git?

Maybe someone can enlighten me what may be going wrong here?

# Log from a successful build

     # tail -f /var/log/messages /var/log/*.log /var/log/postgresql/*.log

     ==> messages <==
     Apr 25 12:00:24 open-music-kontrollers auth.info sshd[32647]: Accepted 
publickey for git from 10.123.1
23.101 port 45928 ssh2: RSA SHA256:e+cW/IK/yuem6U8opVcOCOwpaZ8IGGmlZ9ptxcl5B0s
     Apr 25 12:00:25 open-music-kontrollers auth.info sshd[32668]: Received 
disconnect from 10.123.123.101
port 45928:11: disconnected by user
     Apr 25 12:00:25 open-music-kontrollers auth.info sshd[32668]: Disconnected 
from user git 10.123.123.10
1 port 45928

     ==> builds.sr.ht-api.log <==
     2023/04/25 12:00:25 "POST http://builds.open-music-kontrollers.ch/query 
HTTP/1.0" from ::1 - 200 30B i
n 7.301337ms

     ==> git.sr.ht-api.log <==
     2023/04/25 12:00:24 "POST http://git.open-music-kontrollers.ch/query 
HTTP/1.0" from ::1 - 200 39B in 3
0.264194ms
     2023/04/25 12:00:24 Enqueued user repo:update webhook delivery for 1 
subscriptions
     2023/04/25 12:00:24 93553e94-3e3f-4880-8c16-d72b7128441f: webhook delivery 
complete after 1 attempts

     ==> postgresql/postmaster.log <==
     2023-04-25 10:00:24.821 GMT [32674] LOG:  could not receive data from 
client: Connection reset by peer
     2023-04-25 10:00:25.033 GMT [32699] LOG:  could not receive data from 
client: Connection reset by peer

# Log from a failed build (stuck in 'pending' state)

     ==> messages <==
     Apr 25 11:43:46 open-music-kontrollers auth.info sshd[31849]: Accepted 
publickey for git from 10.123.123.101 port 46484 ssh2: RSA 
SHA256:e+cW/IK/yuem6U8opVcOCOwpaZ8IGGmlZ9ptxcl5B0s
     Apr 25 11:43:46 open-music-kontrollers auth.info sshd[31870]: Received 
disconnect from 10.123.123.101 port 46484:11: disconnected by user
     Apr 25 11:43:46 open-music-kontrollers auth.info sshd[31870]: Disconnected 
from user git 10.123.123.101 port 46484

     ==> builds.sr.ht-api.log <==
     2023/04/25 11:43:46 "POST http://builds.open-music-kontrollers.ch/query 
HTTP/1.0" from ::1 - 200 30B in 14.82308ms

     ==> git.sr.ht-api.log <==
     2023/04/25 11:43:46 "POST http://git.open-music-kontrollers.ch/query 
HTTP/1.0" from ::1 - 200 39B in 10.450941ms
     2023/04/25 11:43:46 Enqueued user repo:update webhook delivery for 1 
subscriptions
     2023/04/25 11:43:46 800fcbcf-3043-4e9f-8a63-64e882363dda: webhook delivery 
complete after 1 attempts

     ==> lists.sr.ht-process.log <==
     [2023-04-25 11:43:46,575: ERROR/MainProcess] Received unregistered task of 
type 'buildsrht.runner.run_build'.
     The message has been ignored and discarded.

     Did you remember to import the module containing this task?
     Or maybe you're using relative imports?

     Please see
     http://docs.celeryq.org/en/latest/internals/protocol.html
     for more information.

     The full contents of the message body was:
 
b'{"id":"a60b7e25-834d-4db5-aabb-0450848eb3f3","task":"buildsrht.runner.run_build","args":[109,{"Arch":null,"Artifacts":[],"Environment":{"BUILD_SUBMITTER":"git.sr.ht","project":"netatom.lv2"},"Image":"alpine
/latest","Packages":["reuse","meson","lv2-dev","serd-dev","sord-dev","sratom-dev"],"Repositories":{},"Secrets":[],"Shell":false,"Sources":["https://git.open-music-kontrollers.ch/~hp/netatom.lv2#ca5d4945b63667412e
95576227c2cdc44f3d8b22"],"Tasks":[{"setup":"cd \\"${project}\\"\\nmeson setup 
build \\\\\\n  -Dbuildtype=release \\\\\\n  -Dbuild-tests=true\\n"},{"build":"cd 
\\"${project}\\"\\nninja -C build install\\n"},{"test
":"cd \\"${project}\\"\\nninja -C build 
test\\n"}],"Triggers":[{"Action":"email","Condition":"failure","To":"\\u003cdev@open-music-kontrollers.ch\\u003e","Cc":null,"InReplyTo":null,"Url":null}],"OAuth":""}],"kwar
gs":{},"retries":0,"eta":null,"expires":null}' (867b)

     Thw full contents of the message headers:
     {}

     The delivery info for this task is:
     {'priority': 0, 'routing_key': 'celery', 'exchange': 'celery'}
     Traceback (most recent call last):
       File 
"/usr/lib/python3.10/site-packages/celery/worker/consumer/consumer.py", line 
591, in on_task_received
         strategy = strategies[type_]
     KeyError: 'buildsrht.runner.run_build'

     ==> postgresql/postmaster.log <==
     2023-04-25 09:43:46.390 GMT [31876] LOG:  could not receive data from 
client: Connection reset by peer
     2023-04-25 09:43:46.585 GMT [31899] LOG:  could not receive data from 
client: Connection reset by peer
Details
Message ID
<CS60HIXKFX0Z.TGNM46UDJZFB@maolood.com>
In-Reply-To
<a89cc6d4-1771-d8be-23c9-4246fd039bb7@open-music-kontrollers.ch> (view parent)
DKIM signature
pass
Download raw message
Looks like your build tasks are being received by lists.sr.ht's celery
worker. You should configure separate redis databases for builds.sr.ht
and lists.sr.ht.
Details
Message ID
<97adf292-e470-c27d-aea3-67ca5d1d9837@open-music-kontrollers.ch>
In-Reply-To
<CS60HIXKFX0Z.TGNM46UDJZFB@maolood.com> (view parent)
DKIM signature
pass
Download raw message
Thanks, that did the trick, indeed.
Reply to thread Export thread (mbox)