Hi, just a report about the Fedora 35 build image.
I have been having this issue for about a year now I think. Fedora 35
occasionally breaks (has no network connectivity so it can't even clone
the sources) and stays broken for many days/weeks (like now - it has
been broken for at least 2 weeks).
It should be reproducible with any repository's .build configuration.
The issue seems to be that /etc/resolv.conf is empty/missing. I can fix
the connectivity issues in the build machine after I ssh and put
`nameserver 8.8.8.8` in that file.
Thanks,
Alex
I don't think Fedora 35 is doing anything different from other Fedoras.
I'm randomly guessing here, but could it be running on a different
machine in the data center, which has a different network configuration?
Further, I think it could be beneficial to add a test script to crontab.
Basically, every several days or every week, just run the sanity-check
script on every image.
Haowen
Hey,
On 7/30/22 02:47, Alexandros Theodotou wrote:
> It should be reproducible with any repository's .build configuration.
I just tried a simple build and it works fine. Can you tell me if it's
still broken for you, and if yes, provide a link to the build (either
the build itself or the build.yml)?
> The issue seems to be that /etc/resolv.conf is empty/missing. I can fix> the connectivity issues in the build machine after I ssh and put> `nameserver 8.8.8.8` in that file.
That is very strange. This file is written during image building. The
Fedora 35 image gets updated on the 4th of each month, so I must be
using the same image as you when you reported, which rules out the image
as source of the problem.
I am wondering if installing a certain package might cause this
(something something systemd-resolvd)?
Conrad
On 7/30/22 02:53, Haowen Liu wrote:
> I don't think Fedora 35 is doing anything different from other Fedoras. > I'm randomly guessing here, but could it be running on a different > machine in the data center, which has a different network configuration?
I'd say the problem description (empty/missing resolv.conf) points more
towards something inside the image itself. I'll keep investigating, though.
> Further, I think it could be beneficial to add a test script to crontab.> Basically, every several days or every week, just run the sanity-check > script on every image.
I don't think this adds much value. The sanity check is run before
deploying an image, which should be sufficient.
Speaking of the crontab, however: I noticed that the rebuild frequency
of the Fedora images documented on man.sr.ht [1] does not match the
crontab (man: daily/daily/weekly, cron: daily/weekly/monthly). Probably
something to be updated?
[1] https://man.sr.ht/builds.sr.ht/compatibility.md#fedora
Conrad
On 8/2/22 05:25, Conrad Hoffmann wrote:
> Speaking of the crontab, however: I noticed that the rebuild frequency > of the Fedora images documented on man.sr.ht [1] does not match the > crontab (man: daily/daily/weekly, cron: daily/weekly/monthly). Probably > something to be updated?
You are right. This is a mistake on my part. I'll send a patch soon.
Haowen
Hi Conrad,
2022-08-02 (火) の 14:18 +0200 に Conrad Hoffmann さんは書きました:
> > On 7/30/22 02:47, Alexandros Theodotou wrote:> > It should be reproducible with any repository's .build> > configuration.> > I just tried a simple build and it works fine. Can you tell me if> it's > still broken for you, and if yes, provide a link to the build (either> the build itself or the build.yml)?
Here is a simple build that fails:
https://builds.sr.ht/~alextee/job/816482
If you look at the manifest all it does is install some packages and
then attempt to clone a repository from sourcehut.
> > The issue seems to be that /etc/resolv.conf is empty/missing. I can> > fix> > the connectivity issues in the build machine after I ssh and put> > `nameserver 8.8.8.8` in that file.> > That is very strange. This file is written during image building. The> Fedora 35 image gets updated on the 4th of each month, so I must be > using the same image as you when you reported, which rules out the> image > as source of the problem.> > I am wondering if installing a certain package might cause this > (something something systemd-resolvd)?
Maybe it's one the packages is overwriting /etc/resolv.conf as you say.
I see systemd-networkd getting installed, maybe it's that?
Thanks,
Alex
Thanks for the update, mystery solved: the problem is indeed the
installation of the systemd-resolved package. I am able to reproduce
with a minimal test case [1].
The exact issue is that upon installation, the systemd-resolved package
replaces /etc/resolv.conf with a symlink:
[build@build ~]$ ls -ln /etc/resolv.conf
lrwxrwxrwx 1 0 0 39 Aug 4 12:06 /etc/resolv.conf ->
../run/systemd/resolve/stub-resolv.conf
However, the link target does not exist **until systemd-resolved
actually gets started** (which doesn't happen automatically).
To me, stuff seems wrong on multiple levels here. First, I wonder what
would need to pull in systemd-resolved as a dependency in the first
place. I see in your logs that it gets installed as a "weak dependecy".
I have no idea what that is, but one work-around might be to simply
remove the offending package from your list and manually install it
without "weak dependecies" as first thing in your build script.
The other obvious deficiency here is that the system breaks upon package
installation. I don't know anything about how we build the images, maybe
the package intended to auto-start the deamon on install? Otherwise,
this would be something to report to upstream.
Given that things are what they are and you would probably like to just
use the image, one possible work-around on our side would be to simply
install a second copy of our /etc/resolv.conf at
/run/systemd/resolve/stub-resolv.conf. This copy would then get picked
up after the package installation changed /etc/resolv.conf to be a
symlink. Not pretty, but hey...
Haowen, maybe you can shed some more light on this? I know nothing about
the Fedora package manager, and if it's us that are holding it wrong or
if this is really how it is intended to behave?
Cheers,
Conrad
[1] https://builds.sr.ht/~bitfehler/job/816518
My schedule is kinda tight recently, and afaik fedora/35 refreshes have
been failing pretty randomly in the recent several months.
See: https://builds.sr.ht/~sircmpwn/job/816721. It would help greatly if
somebody knows what's the issue with those builds.
Back to the systemd-resolved issue, I think it's installed as a weak
dependency of systemd-networkd (maybe?). Anyway it isn't supposed to
break network connectivity in any case. I'm pretty sure it's installed
on all the desktop fedora images by default. In short, I'm not sure what
is wrong with it or how I should fix it. Your suggested workaround
definitely works, but I guess sticking to the newer fedoras, which gets
refreshed much more frequently and gets more attention both upstream and
on this platform, might be another choice as well.
Haowen
On 8/4/22 05:25, Conrad Hoffmann wrote:
> Thanks for the update, mystery solved: the problem is indeed the > installation of the systemd-resolved package. I am able to reproduce > with a minimal test case [1].> > The exact issue is that upon installation, the systemd-resolved package > replaces /etc/resolv.conf with a symlink:> > [build@build ~]$ ls -ln /etc/resolv.conf> lrwxrwxrwx 1 0 0 39 Aug 4 12:06 /etc/resolv.conf -> > ../run/systemd/resolve/stub-resolv.conf> > However, the link target does not exist **until systemd-resolved > actually gets started** (which doesn't happen automatically).> > To me, stuff seems wrong on multiple levels here. First, I wonder what > would need to pull in systemd-resolved as a dependency in the first > place. I see in your logs that it gets installed as a "weak dependecy". > I have no idea what that is, but one work-around might be to simply > remove the offending package from your list and manually install it > without "weak dependecies" as first thing in your build script.> > The other obvious deficiency here is that the system breaks upon package > installation. I don't know anything about how we build the images, maybe > the package intended to auto-start the deamon on install? Otherwise, > this would be something to report to upstream.> > Given that things are what they are and you would probably like to just > use the image, one possible work-around on our side would be to simply > install a second copy of our /etc/resolv.conf at > /run/systemd/resolve/stub-resolv.conf. This copy would then get picked > up after the package installation changed /etc/resolv.conf to be a > symlink. Not pretty, but hey...> > Haowen, maybe you can shed some more light on this? I know nothing about > the Fedora package manager, and if it's us that are holding it wrong or > if this is really how it is intended to behave?> > Cheers,> Conrad> > [1] https://builds.sr.ht/~bitfehler/job/816518> >
On Sat Aug 6, 2022 at 6:43 AM CEST, Haowen Liu wrote:
> See: https://builds.sr.ht/~sircmpwn/job/816721. It would help greatly if > somebody knows what's the issue with those builds.
It seems RedHat's servers have some issues. Downloading the metadata
using the link worked just fine every time I tried it with my browser,
but apparently dnf did not have the same experience[0]. I assume
RedHat's servers are outside your control, so maybe drop the Fedora
mailinglist a message about this (including the times when it fails),
maybe they can tell you more.
I don't really see any way to "fix" this other than either wait for
Fedora/RedHat. The only "solution" for the time being I see is: retry :/
Hope that helped at least a tiny bit :)
[0]: https://builds.sr.ht/~sircmpwn/job/816721#task-genimg-1030
--
Moritz Poldrack
https://moritz.sh
Thanks for looking into it. Actually, found what's wrong:
https://builds.sr.ht/~sircmpwn/job/816721#task-genimg-1027.
It seems ALL DNF commands needs a releasever specified, which I don't
understand. I thought after installing the F35 base system, DNF would be
able to figure that out from there, but it seems that feature is a
recent addition in F36 because F36 isn't affected by this bug.
Haowen
Alright, patch sent and that will hopefully solve the build issue.
See: https://lists.sr.ht/~sircmpwn/sr.ht-dev/patches/34492
As for the broken resolved thing, I think (not sure) it's just one
refresh being defective and that all the subsequent refreshes are stuck
because of the above bug. Hopefully after the patch is applied and next
refresh happens, everything will be fine.
Haowen
On 8/6/22 02:16, Moritz Poldrack wrote:
> On Sat Aug 6, 2022 at 6:43 AM CEST, Haowen Liu wrote:>> See: https://builds.sr.ht/~sircmpwn/job/816721. It would help greatly if>> somebody knows what's the issue with those builds.> > It seems RedHat's servers have some issues. Downloading the metadata> using the link worked just fine every time I tried it with my browser,> but apparently dnf did not have the same experience[0]. I assume> RedHat's servers are outside your control, so maybe drop the Fedora> mailinglist a message about this (including the times when it fails),> maybe they can tell you more.> > I don't really see any way to "fix" this other than either wait for> Fedora/RedHat. The only "solution" for the time being I see is: retry :/> > Hope that helped at least a tiny bit :)> > [0]: https://builds.sr.ht/~sircmpwn/job/816721#task-genimg-1030