~sircmpwn/sr.ht-discuss

10 4

Fedora 35 image is always broken

Details
Message ID
<47b7f19e61204e893a364d24780a8dce81f5e3ce.camel@zrythm.org>
DKIM signature
missing
Download raw message
Hi, just a report about the Fedora 35 build image.

I have been having this issue for about a year now I think. Fedora 35
occasionally breaks (has no network connectivity so it can't even clone
the sources) and stays broken for many days/weeks (like now - it has
been broken for at least 2 weeks).

It should be reproducible with any repository's .build configuration.

The issue seems to be that /etc/resolv.conf is empty/missing. I can fix
the connectivity issues in the build machine after I ssh and put
`nameserver 8.8.8.8` in that file.

Thanks,
Alex
Details
Message ID
<d9beb9af-fb69-a20c-866b-8ebd72f91216@lunacd.com>
In-Reply-To
<47b7f19e61204e893a364d24780a8dce81f5e3ce.camel@zrythm.org> (view parent)
DKIM signature
pass
Download raw message
I don't think Fedora 35 is doing anything different from other Fedoras. 
I'm randomly guessing here, but could it be running on a different 
machine in the data center, which has a different network configuration?

Further, I think it could be beneficial to add a test script to crontab.
Basically, every several days or every week, just run the sanity-check 
script on every image.

Haowen
Details
Message ID
<9c35af65-de53-2f1c-6916-690a99670a4d@bitfehler.net>
In-Reply-To
<47b7f19e61204e893a364d24780a8dce81f5e3ce.camel@zrythm.org> (view parent)
DKIM signature
pass
Download raw message
Hey,

On 7/30/22 02:47, Alexandros Theodotou wrote:
> It should be reproducible with any repository's .build configuration.

I just tried a simple build and it works fine. Can you tell me if it's 
still broken for you, and if yes, provide a link to the build (either 
the build itself or the build.yml)?

> The issue seems to be that /etc/resolv.conf is empty/missing. I can fix
> the connectivity issues in the build machine after I ssh and put
> `nameserver 8.8.8.8` in that file.

That is very strange. This file is written during image building. The 
Fedora 35 image gets updated on the 4th of each month, so I must be 
using the same image as you when you reported, which rules out the image 
as source of the problem.

I am wondering if installing a certain package might cause this 
(something something systemd-resolvd)?

Conrad
Details
Message ID
<0cbfa21a-0c1f-dfd8-b814-85fbb072929b@bitfehler.net>
In-Reply-To
<d9beb9af-fb69-a20c-866b-8ebd72f91216@lunacd.com> (view parent)
DKIM signature
pass
Download raw message
On 7/30/22 02:53, Haowen Liu wrote:
> I don't think Fedora 35 is doing anything different from other Fedoras. 
> I'm randomly guessing here, but could it be running on a different 
> machine in the data center, which has a different network configuration?

I'd say the problem description (empty/missing resolv.conf) points more 
towards something inside the image itself. I'll keep investigating, though.

> Further, I think it could be beneficial to add a test script to crontab.
> Basically, every several days or every week, just run the sanity-check 
> script on every image.

I don't think this adds much value. The sanity check is run before 
deploying an image, which should be sufficient.

Speaking of the crontab, however: I noticed that the rebuild frequency 
of the Fedora images documented on man.sr.ht [1] does not match the 
crontab (man: daily/daily/weekly, cron: daily/weekly/monthly). Probably 
something to be updated?

[1] https://man.sr.ht/builds.sr.ht/compatibility.md#fedora

Conrad
Details
Message ID
<6c9994ed-8d09-80b3-42aa-cec6bbe00c3d@lunacd.com>
In-Reply-To
<0cbfa21a-0c1f-dfd8-b814-85fbb072929b@bitfehler.net> (view parent)
DKIM signature
pass
Download raw message
On 8/2/22 05:25, Conrad Hoffmann wrote:
> Speaking of the crontab, however: I noticed that the rebuild frequency 
> of the Fedora images documented on man.sr.ht [1] does not match the 
> crontab (man: daily/daily/weekly, cron: daily/weekly/monthly). Probably 
> something to be updated?

You are right. This is a mistake on my part. I'll send a patch soon.

Haowen
Details
Message ID
<46db070f604c19c5e502dd5d30235057e65f5288.camel@zrythm.org>
In-Reply-To
<9c35af65-de53-2f1c-6916-690a99670a4d@bitfehler.net> (view parent)
DKIM signature
missing
Download raw message
Hi Conrad,

2022-08-02 (火) の 14:18 +0200 に Conrad Hoffmann さんは書きました:
> 
> On 7/30/22 02:47, Alexandros Theodotou wrote:
> > It should be reproducible with any repository's .build
> > configuration.
> 
> I just tried a simple build and it works fine. Can you tell me if
> it's 
> still broken for you, and if yes, provide a link to the build (either
> the build itself or the build.yml)?

Here is a simple build that fails:
https://builds.sr.ht/~alextee/job/816482

If you look at the manifest all it does is install some packages and
then attempt to clone a repository from sourcehut.

> > The issue seems to be that /etc/resolv.conf is empty/missing. I can
> > fix
> > the connectivity issues in the build machine after I ssh and put
> > `nameserver 8.8.8.8` in that file.
> 
> That is very strange. This file is written during image building. The
> Fedora 35 image gets updated on the 4th of each month, so I must be 
> using the same image as you when you reported, which rules out the
> image 
> as source of the problem.
> 
> I am wondering if installing a certain package might cause this 
> (something something systemd-resolvd)?

Maybe it's one the packages is overwriting /etc/resolv.conf as you say.
I see systemd-networkd getting installed, maybe it's that?


Thanks,
Alex
Details
Message ID
<8c7d8509-fa2f-302b-7757-d1c0367e9ac1@bitfehler.net>
In-Reply-To
<46db070f604c19c5e502dd5d30235057e65f5288.camel@zrythm.org> (view parent)
DKIM signature
pass
Download raw message
Thanks for the update, mystery solved: the problem is indeed the 
installation of the systemd-resolved package. I am able to reproduce 
with a minimal test case [1].

The exact issue is that upon installation, the systemd-resolved package 
replaces /etc/resolv.conf with a symlink:

[build@build ~]$ ls -ln /etc/resolv.conf
lrwxrwxrwx 1 0 0 39 Aug  4 12:06 /etc/resolv.conf -> 
../run/systemd/resolve/stub-resolv.conf

However, the link target does not exist **until systemd-resolved 
actually gets started** (which doesn't happen automatically).

To me, stuff seems wrong on multiple levels here. First, I wonder what 
would need to pull in systemd-resolved as a dependency in the first 
place. I see in your logs that it gets installed as a "weak dependecy". 
I have no idea what that is, but one work-around might be to simply 
remove the offending package from your list and manually install it 
without "weak dependecies" as first thing in your build script.

The other obvious deficiency here is that the system breaks upon package 
installation. I don't know anything about how we build the images, maybe 
the package intended to auto-start the deamon on install? Otherwise, 
this would be something to report to upstream.

Given that things are what they are and you would probably like to just 
use the image, one possible work-around on our side would be to simply 
install a second copy of our /etc/resolv.conf at 
/run/systemd/resolve/stub-resolv.conf. This copy would then get picked 
up after the package installation changed /etc/resolv.conf to be a 
symlink. Not pretty, but hey...

Haowen, maybe you can shed some more light on this? I know nothing about 
the Fedora package manager, and if it's us that are holding it wrong or 
if this is really how it is intended to behave?

Cheers,
Conrad

[1] https://builds.sr.ht/~bitfehler/job/816518
Details
Message ID
<0764dd5c-216a-353a-7785-38ae8b169fcd@lunacd.com>
In-Reply-To
<8c7d8509-fa2f-302b-7757-d1c0367e9ac1@bitfehler.net> (view parent)
DKIM signature
pass
Download raw message
My schedule is kinda tight recently, and afaik fedora/35 refreshes have 
been failing pretty randomly in the recent several months.

See: https://builds.sr.ht/~sircmpwn/job/816721. It would help greatly if 
somebody knows what's the issue with those builds.

Back to the systemd-resolved issue, I think it's installed as a weak 
dependency of systemd-networkd (maybe?). Anyway it isn't supposed to 
break network connectivity in any case. I'm pretty sure it's installed 
on all the desktop fedora images by default. In short, I'm not sure what 
is wrong with it or how I should fix it. Your suggested workaround 
definitely works, but I guess sticking to the newer fedoras, which gets 
refreshed much more frequently and gets more attention both upstream and 
on this platform, might be another choice as well.

Haowen

On 8/4/22 05:25, Conrad Hoffmann wrote:
> Thanks for the update, mystery solved: the problem is indeed the 
> installation of the systemd-resolved package. I am able to reproduce 
> with a minimal test case [1].
> 
> The exact issue is that upon installation, the systemd-resolved package 
> replaces /etc/resolv.conf with a symlink:
> 
> [build@build ~]$ ls -ln /etc/resolv.conf
> lrwxrwxrwx 1 0 0 39 Aug  4 12:06 /etc/resolv.conf -> 
> ../run/systemd/resolve/stub-resolv.conf
> 
> However, the link target does not exist **until systemd-resolved 
> actually gets started** (which doesn't happen automatically).
> 
> To me, stuff seems wrong on multiple levels here. First, I wonder what 
> would need to pull in systemd-resolved as a dependency in the first 
> place. I see in your logs that it gets installed as a "weak dependecy". 
> I have no idea what that is, but one work-around might be to simply 
> remove the offending package from your list and manually install it 
> without "weak dependecies" as first thing in your build script.
> 
> The other obvious deficiency here is that the system breaks upon package 
> installation. I don't know anything about how we build the images, maybe 
> the package intended to auto-start the deamon on install? Otherwise, 
> this would be something to report to upstream.
> 
> Given that things are what they are and you would probably like to just 
> use the image, one possible work-around on our side would be to simply 
> install a second copy of our /etc/resolv.conf at 
> /run/systemd/resolve/stub-resolv.conf. This copy would then get picked 
> up after the package installation changed /etc/resolv.conf to be a 
> symlink. Not pretty, but hey...
> 
> Haowen, maybe you can shed some more light on this? I know nothing about 
> the Fedora package manager, and if it's us that are holding it wrong or 
> if this is really how it is intended to behave?
> 
> Cheers,
> Conrad
> 
> [1] https://builds.sr.ht/~bitfehler/job/816518
> 
> 
Details
Message ID
<CLYTZF8ST3VS.2BKTO1YYEQRK@Archetype>
In-Reply-To
<0764dd5c-216a-353a-7785-38ae8b169fcd@lunacd.com> (view parent)
DKIM signature
pass
Download raw message
On Sat Aug 6, 2022 at 6:43 AM CEST, Haowen Liu wrote:
> See: https://builds.sr.ht/~sircmpwn/job/816721. It would help greatly if 
> somebody knows what's the issue with those builds.

It seems RedHat's servers have some issues. Downloading the metadata
using the link worked just fine every time I tried it with my browser,
but apparently dnf did not have the same experience[0]. I assume
RedHat's servers are outside your control, so maybe drop the Fedora
mailinglist a message about this (including the times when it fails),
maybe they can tell you more.

I don't really see any way to "fix" this other than either wait for
Fedora/RedHat. The only "solution" for the time being I see is: retry :/

Hope that helped at least a tiny bit :)

[0]: https://builds.sr.ht/~sircmpwn/job/816721#task-genimg-1030
-- 
Moritz Poldrack
https://moritz.sh
Details
Message ID
<c2959732-4f82-ddd2-52b1-69d107a9d92c@lunacd.com>
In-Reply-To
<CLYTZF8ST3VS.2BKTO1YYEQRK@Archetype> (view parent)
DKIM signature
pass
Download raw message
Thanks for looking into it. Actually, found what's wrong: 
https://builds.sr.ht/~sircmpwn/job/816721#task-genimg-1027.

It seems ALL DNF commands needs a releasever specified, which I don't 
understand. I thought after installing the F35 base system, DNF would be 
able to figure that out from there, but it seems that feature is a 
recent addition in F36 because F36 isn't affected by this bug.

Haowen
Details
Message ID
<67ba6090-df6a-9730-dc20-ccb1c4b50798@lunacd.com>
In-Reply-To
<CLYTZF8ST3VS.2BKTO1YYEQRK@Archetype> (view parent)
DKIM signature
pass
Download raw message
Alright, patch sent and that will hopefully solve the build issue.
See: https://lists.sr.ht/~sircmpwn/sr.ht-dev/patches/34492

As for the broken resolved thing, I think (not sure) it's just one 
refresh being defective and that all the subsequent refreshes are stuck 
because of the above bug. Hopefully after the patch is applied and next 
refresh happens, everything will be fine.

Haowen

On 8/6/22 02:16, Moritz Poldrack wrote:
> On Sat Aug 6, 2022 at 6:43 AM CEST, Haowen Liu wrote:
>> See: https://builds.sr.ht/~sircmpwn/job/816721. It would help greatly if
>> somebody knows what's the issue with those builds.
> 
> It seems RedHat's servers have some issues. Downloading the metadata
> using the link worked just fine every time I tried it with my browser,
> but apparently dnf did not have the same experience[0]. I assume
> RedHat's servers are outside your control, so maybe drop the Fedora
> mailinglist a message about this (including the times when it fails),
> maybe they can tell you more.
> 
> I don't really see any way to "fix" this other than either wait for
> Fedora/RedHat. The only "solution" for the time being I see is: retry :/
> 
> Hope that helped at least a tiny bit :)
> 
> [0]: https://builds.sr.ht/~sircmpwn/job/816721#task-genimg-1030
Reply to thread Export thread (mbox)