~sircmpwn/sr.ht-dev

8 2

Proposed new design for builds.sr.ht

Details
Message ID
<20181230161951.GA4488@homura.localdomain>
Sender timestamp
1546186791
DKIM signature
missing
Download raw message
Greetings! As I've been working on multi-arch support, handling
roadblocks in the current design, and considering the feedback of the
community, I'm thinking about what the next generation of builds.sr.ht
looks like. Here are the goals:

- Better management & distribution of build images
- Each worker advertising what image+arch combos it supports, rather
  than assuming every worker supports everything
- Ability for third-parties to run build boxes which are slaved to
  builds.sr.ht upstream
- Diversification of build drivers
- Support external ownership over secrets
- Improved communication between build environment to/from host & master

The first change will be to switch from Celery to RabbitMQ for build
distribution. I intend to manage build distribution over one or more
exchanges, with the primary exchange used for the shared build boxes
that run on my infrastructure, and allowing users to set up secondary
exchanges that they can run their build boxes against.

To support each worker advertising a different set of
images/arches/drivers/etc, I intend to use AMQP routing keys. A worker's
build capabilities will be expressed as "image/version:arch+driver",
which is expressed in your build manifest like this:

base: alpine/edge:x86_64+kvm

Naturally this could be shortened to "base: alpine/edge", the rest being
assumed as defaults. But you could also do:

base: alpine/edge:aarch64+qemu

To use qemu software emulation of aarch64. Or alpine/edge:riscv64+native
once I set up my RISC-V system.

For better management & distribution of build images, I intend to use
bittorrent, managed entirely in code (rather than by your typical
end-user bittorrent daemon). Then, rather than the current system of the
image refreshes going out to each build slave to push the new images, we
can just push them to a central repository and let the boxes sync
themselves up. Can also use this system to distribute pre-built images
to third parties.

The diversification of build drivers will be necessary to support more
than just KVM. Today's QEMU-based driver can support KVM and software
emulation of targets, but I want to codify that distinction in the build
manifest and build exchange for when there are build slaves in the
future running with a greater variety of KVM-supported architectures.
The full list of build drivers I have planned are:

- kvm, qemu: one qemu-based driver supporting two flavors of builds
- docker: does what it says on the tin. Note that I am unlikely to offer
  docker support on the shared builders for security reasons, though I
  might consider it if anyone figures out this puzzler[0]
- chroot: runs builds in a chroot with overlayfs+tempfs, so you can
  basically do whatever if you control the build hardware
- riscv64: the RISC-V builder has some special needs which will require
  a custom driver, so this'll have to exist. I can go into detail if
  anyone is curious but it's not important to the overall builds.sr.ht
  design.
- designed in a way that users can write their own build drivers and
  plug them into the exchange, for example with windows+powershell or
  something

[0] https://github.com/moby/moby/issues/37575

If you run your own build box then you might also want to manage your
own secrets, so that you needn't give them over to sr.ht.  This one is
pretty simple, it'd just take the form of:

    secret-provider: https://secrets.example.org

To specify the place to fetch secrets from. Some combination of APIs
will allow you to confirm that the build slave asking for the secrets is
running a build that uses them, and each build box will have an
asymmetric key for signing these requests.

Lastly, I want to improve the way that the build environment
communicates with the hosts. This will probably take the form of an HTTP
API which communicates to the build host on a TCP port set up when the
driver is initialized. Today I just intend to use this for ending builds
early, to replace today's fragile exit-code-based hack, but there are
probably more use-cases for this in the future.

A final note, I also want to make it possible to obtain an interactive
shell in the build environment. Basically this will just take the form
of something like this in your build manifest:

    shell: true

Then after all of your steps run, instead of tearing down the build
environment it'll print an SSH connection string into the build log and
wait for you to log in. This feature could be a target of abuse, so
it'll require some finesse to get right.

So that's what's in store. Any feedback or better ideas?
Details
Message ID
<e601d9f6-996c-5a30-f9aa-10971bd7752f@mnus.de>
In-Reply-To
<20181230161951.GA4488@homura.localdomain> (view parent)
Sender timestamp
1546192622
DKIM signature
missing
Download raw message
Hi there,

I'm not using builds.sr.ht, but I can offer some feedback, partly based
on experience with similar projects.


> The first change will be to switch from Celery to RabbitMQ for build
> distribution.

Workers polling an API may be a simpler alternative offering more
control over distribution (since you don't push jobs into a queue but
have them pulled by the workers). That also avoids having to maintain
and know RabbitMQ (or Celery for that matter). At least that's my
experience from a cancelled AMQP project.


> To support each worker advertising a different set of
> images/arches/drivers/etc

That sounds like a good idea. Maybe it can be extended by filtering by
hardware capabilities, like lots of cores, much RAM or a 10G network
connection?


> For better management & distribution of build images, I intend to use
> bittorrent, managed entirely in code (rather than by your typical
> end-user bittorrent daemon). Then, rather than the current system of the
> image refreshes going out to each build slave to push the new images, we
> can just push them to a central repository and let the boxes sync
> themselves up. Can also use this system to distribute pre-built images
> to third parties.

While that sounds incredibly cool, it also seems quite overkill. I'd
image a central repo on a decent server (i.e. 1gbit/s network) to do the
job just fine for dozens of build servers.

When there's a lot of custom images (e.g. software baked in for specific
build types a specific user has) a central server probably falls flat
rather quick. Only using the central repo to track image metadata could
work neatly for custom images. No central storage is occupied and
distribution purely works between build servers. Maybe you already had
it in mind like this.

A somewhat unrelated thought: Do some images have to be private and
should thus be encrypted if transferred over BitTorrent?


> - docker: does what it says on the tin. Note that I am unlikely to offer
>   docker support on the shared builders for security reasons, though I
>   might consider it if anyone figures out this puzzler[0]

For security reasons it makes sense to run Docker in a VM, but why does
the VM have to run in Docker?


> A final note, I also want to make it possible to obtain an interactive
> shell in the build environment. Basically this will just take the form
> of something like this in your build manifest:
> 
>     shell: true
> 
> Then after all of your steps run, instead of tearing down the build
> environment it'll print an SSH connection string into the build log and
> wait for you to log in. This feature could be a target of abuse, so
> it'll require some finesse to get right.

Maybe that should be an option when triggering the build instead, so you
can debug any build configuration at any time?
Details
Message ID
<20181230184332.GA24653@homura.localdomain>
In-Reply-To
<e601d9f6-996c-5a30-f9aa-10971bd7752f@mnus.de> (view parent)
Sender timestamp
1546195412
DKIM signature
missing
Download raw message
On 2018-12-30  6:57 PM, minus wrote:
> Workers polling an API may be a simpler alternative offering more
> control over distribution (since you don't push jobs into a queue but
> have them pulled by the workers). That also avoids having to maintain
> and know RabbitMQ (or Celery for that matter). At least that's my
> experience from a cancelled AMQP project.

I'm allergic to polling. One thing builds.sr.ht has over the competetion
is the boot-up speed - you submit a job and it's running by the time you
take your next breath. I want to get that even faster still, not slower.

> That sounds like a good idea. Maybe it can be extended by filtering by
> hardware capabilities, like lots of cores, much RAM or a 10G network
> connection?

Seems like overkill, though this might eventually be useful to address
the needs of users with high performance requirements... I think that it
would easily be stapled onto the system as-designed though so no need to
worry about it now.

> While that sounds incredibly cool, it also seems quite overkill. I'd
> image a central repo on a decent server (i.e. 1gbit/s network) to do the
> job just fine for dozens of build servers.

Today I use a push-based system, and I want to switch to a pull-based
system. Something simple like you described would probably work at first
and might be an easier/faster path with bittorrent as a reasonable
future upgrade.

> A somewhat unrelated thought: Do some images have to be private and
> should thus be encrypted if transferred over BitTorrent?

At the moment the design is not concerned with private images. Adding
encryption later wouldn't be too hard though.

> For security reasons it makes sense to run Docker in a VM, but why does
> the VM have to run in Docker?

It doesn't, and this might not be a concern when I rewrite the qemu
driver. I definitely need to do something _like_ Docker, a chroot at the
minimum but probably more than that. It's possible that I'll run into
whatever the source of the docker issue is while implementing the
sandbox. we'll see how it shakes out.

> > A final note, I also want to make it possible to obtain an interactive
> > shell in the build environment. Basically this will just take the form
> > of something like this in your build manifest:
> > 
> >     shell: true
>
> Maybe that should be an option when triggering the build instead, so you
> can debug any build configuration at any time?

Since you submit the manifest with the API request, and it's in a well
supported machine readable format (YAML), I generally consider "modify
the manifest" to be the better choice. If you want to add a shell at
submit time, edit the YAML. dispatch.sr.ht does this to insert its
webhook and update the commit to check out, lists.sr.ht will do this to
apply patches, etc. In the future I will likely want to move features
_out_ of the API request and into the manifest.
Details
Message ID
<f43b04b9-a287-890c-5930-a3eeedc5c369@mnus.de>
In-Reply-To
<20181230184332.GA24653@homura.localdomain> (view parent)
Sender timestamp
1546196928
DKIM signature
missing
Download raw message
On 30/12/2018 19.43, Drew DeVault wrote:
> I'm allergic to polling. One thing builds.sr.ht has over the competetion
> is the boot-up speed - you submit a job and it's running by the time you
> take your next breath. I want to get that even faster still, not slower.

I see. I suggested polling because it's simpler and works reasonably
fast with a polling frequency of a couple of seconds. In that case, I'd
still prefer a custom subscription system over HTTP (WebSockets or SSE)
or TCP, where you connect, send your desired subscription/advertise your
capabilities and then wait for new build requests on that connection.
This allows for complex filtering and that the coordinator directly sees
which worker is responsible for the build. I'm not sure how this would
work with AMQP with fixed queues.


>> That sounds like a good idea. Maybe it can be extended by filtering by
>> hardware capabilities, like lots of cores, much RAM or a 10G network
>> connection?
> 
> Seems like overkill, though this might eventually be useful to address
> the needs of users with high performance requirements... I think that it
> would easily be stapled onto the system as-designed though so no need to
> worry about it now.

Same as above; I can't see that working with AMQP with fixed queues


>>> A final note, I also want to make it possible to obtain an interactive
>>> shell in the build environment. Basically this will just take the form
>>> of something like this in your build manifest:
>>>
>>>     shell: true
>>
>> Maybe that should be an option when triggering the build instead, so you
>> can debug any build configuration at any time?
> 
> Since you submit the manifest with the API request, and it's in a well
> supported machine readable format (YAML), I generally consider "modify
> the manifest" to be the better choice. If you want to add a shell at
> submit time, edit the YAML. dispatch.sr.ht does this to insert its
> webhook and update the commit to check out, lists.sr.ht will do this to
> apply patches, etc. In the future I will likely want to move features
> _out_ of the API request and into the manifest.

I see; it makes sense to be in the YAML then, of course. I guess what I
was thinking of was just having a button on the builds.sr.ht build log
page that resubmits the build and adds the debug shell to the manifest
on the fly.
Details
Message ID
<20181230191142.GC1919@homura.localdomain>
In-Reply-To
<f43b04b9-a287-890c-5930-a3eeedc5c369@mnus.de> (view parent)
Sender timestamp
1546197102
DKIM signature
missing
Download raw message
On 2018-12-30  8:08 PM, minus wrote:
> I see. I suggested polling because it's simpler and works reasonably
> fast with a polling frequency of a couple of seconds. In that case, I'd
> still prefer a custom subscription system over HTTP (WebSockets or SSE)
> or TCP, where you connect, send your desired subscription/advertise your
> capabilities and then wait for new build requests on that connection.
> This allows for complex filtering and that the coordinator directly sees
> which worker is responsible for the build. I'm not sure how this would
> work with AMQP with fixed queues.

There'd basically be N queues where N = number of combinations of
image+arch+driver. Or maybe sorted into exchanges. Not sure. I don't
think there's any issue with having many queues. I'd rather use a
service like RabbitMQ which is well supported and has things like HA
working out-of-the-box, if I can. If it becomes clear it's not going to
work then naturally I'd go for something more custom.

> > Seems like overkill, though this might eventually be useful to address
> > the needs of users with high performance requirements... I think that it
> > would easily be stapled onto the system as-designed though so no need to
> > worry about it now.
> 
> Same as above; I can't see that working with AMQP with fixed queues

Yeah, I'll have to think about this some more...

> I see; it makes sense to be in the YAML then, of course. I guess what I
> was thinking of was just having a button on the builds.sr.ht build log
> page that resubmits the build and adds the debug shell to the manifest
> on the fly.

There already is, it's "Edit & resubmit"
Details
Message ID
<CAAOQYs1GH1ONvcbk6ZX9-A5JRumYXmGsqUERSj=926B2gXCT1A@mail.gmail.com>
In-Reply-To
<20181230161951.GA4488@homura.localdomain> (view parent)
Sender timestamp
1546204302
DKIM signature
missing
Download raw message
> - riscv64: the RISC-V builder has some special needs which will require
>  a custom driver, so this'll have to exist. I can go into detail if
>  anyone is curious but it's not important to the overall builds.sr.ht
>  design.

Please do!
Details
Message ID
<20181230212201.GA32041@homura.localdomain>
In-Reply-To
<CAAOQYs1GH1ONvcbk6ZX9-A5JRumYXmGsqUERSj=926B2gXCT1A@mail.gmail.com> (view parent)
Sender timestamp
1546204921
DKIM signature
missing
Download raw message
Well, the crux of the issue is that the HiFive Unleashed doesn't support
KVM (the necessary RISC-V extensions aren't even fully specified yet),
so I need to have some other way of running builds on it. I intend to
set up a management system (probably a raspberry pi) which is connected
to each RISC-V in the cluster by serial, and also connected to a set of
relays which cycle the power on the HiFive boards. When a job comes in,
it'll power on one of the units, which will have a custom kernel that
does some work to set up a restricted environment on the board where
users can run builds normally but not make any permanent changes to the
board (e.g. install a backdoor which siphons out secrets in subsequent
builds). The serial interface will also be useful for remote maintenance
on each board.

The raspberry pi will have to run some kind of custom software which
uses the build exchange API to accept and run builds on this setup.
Details
Message ID
<CAAOQYs3NO3gfRPieM7K=3SSz-xPGyNrDU_CiWzoGpBBsj8RTJg@mail.gmail.com>
In-Reply-To
<20181230212201.GA32041@homura.localdomain> (view parent)
Sender timestamp
1546209198
DKIM signature
missing
Download raw message
Cool, thanks!
Details
Message ID
<1549842719.2778341.1655107504.76EC0554@webmail.messagingengine.com>
In-Reply-To
<20181230161951.GA4488@homura.localdomain> (view parent)
Sender timestamp
1549842719
DKIM signature
missing
Download raw message
On Sun, 30 Dec 2018, at 17:19, Drew DeVault wrote:

Sorry about the belated response, it took a while to collect feedback over
holidays - hopefully not too late.

> So that's what's in store. Any feedback or better ideas?

I think rabbitmq is a good move. I've been experimenting with similar
things (automated build agents) in the last 6 months and have a few
thoughts on this.

TLDR: I don't think that a topic exchange does what you want, and
you're better off having a private queue per builder and your own
(python based) "router" that ensures jobs are directed appropriately.
You can use username/password and queue names to secure private
queues from public ones if needed.

Details below for the curious.

A+
Dave

Presumably there are:

- multiple builders running concurrently for various combinations
  of os/version/arch/driver
-  but each builder is only handling 1 job at a time
-  each job is expected to be handled only once
- some jobs may only be for private builders

This implies that:

- you need a single queue (only 1 builder handles a given job)
- user/password combos will allow securing private builders
- this negates a key advantage of the topic exchange, namely that
  consumers have their own queue that receives only the bindings
  that they're interested in
  
Does each builder accept multiple types?

>  To support each worker advertising a different set of
> images/arches/drivers/etc, I intend to use AMQP routing keys. A worker's
> build capabilities will be expressed as "image/version:arch+driver",
> which is expressed in your build manifest like this:
> 
> base: alpine/edge:x86_64+kvm

expanding this a bit further, you'd have a topic exchange[0] where build jobs
get published to with routing keys like this (as they have to be "." separated)
to a single queue of all inbound jobs. A single queue is appropriate, as you
want to ensure that only one builder attempts a given job.

os.version.arch.driver

Wildcards for amqp routing are either [1]:

- # (equiv of regex .+$ )
- * (matches \w+) 

e.g. say we end up with the following example routing keys in the future:

alpine.edge.x86_64.kvm
alpine.edge.aarch64.kvm
alpine.edge.x86_64.kvm
alpine.edge.aarch64.kvm
ubuntu.18.x86_64.kvm
ubuntu.18.x86_64.qemu
freebsd.12.aarch64.qemu
freebsd.11.amd64.qemu
freebsd.current.amd64.jail
openbsd.6.amd64.vmm

Would you expect each builder combo above to be a separate agent? Or
would alpine.* be able to do all the alpine things using some sort of emulation?

[0]: https://www.cloudamqp.com/blog/2015-09-03-part4-rabbitmq-for-beginners-exchanges-routing-keys-bindings.html
[1]: https://www.rabbitmq.com/tutorials/tutorial-five-python.html

I don't see that working in practise. I followed up with this post to the rabbitmq list, viz https://groups.google.com/forum/#!topic/rabbitmq-users/VFsMwNID9I4  also in non-js form below:
--------------------
hi

I'm looking for some guidance on whether this scenario is appropriate for RabbitMQ with routing & exchanges alone.

The environment is the typical git commit builder, where a given high level user request is translated into multiple jobs, one for each os/version/architecture/... combination, and delivered to an exchange as a separate amqp message.
 
1. a given builder may be capable of handling a variety of combinations
2. each job/message should only be run once (but not forgotten!)
3. jobs should be spread across all appropriate builders
4. as builders may have access to privileged info we'll need user/password + to restrict queues

# topic exchange

At first glance this sounds an ideal match.

1. each worker binds its exclusive queue repeatedly with all the topic combinations it can support
2. additional external locking is required to prevent multiple builders consuming an identical message 

Alas, most builders will simply be declining jobs that are either not suitable, or already being built elsewhere, and this will become worse as the message throughput rises. It doesn't appear to be possible to identify unconsumable jobs without using the external locking as well.

# header exchange

1. each builder binds its exclusive queue using multiple ALL matches (broadly similar to a the topic exchange)
2. as a message will only be routed to a single queue, we can detect and handled use unroutable messages

The header exchange which handles all requirements well, *but* what ended up in practice is that jobs were not distributed evenly across suitable queues,  instead only the first worker that meets characteristics ends up with all the jobs :-( let's hope its the fastest one! Is there a way to distribute the jobs roughly across suitable queues?
 
Are there any alternatives that I might have missed?
...
---------------

and response from Michael Klishin

Your observations are correct. The headers exchange routes to a single queue IIRC.

I'd try the following:

 * Use exchange-to-exchange bindings and a "frontend" headers (or topic) exchange [1]
 * …that would route messages to a consistent-hash-exchange with N bound queues [2]

In recent versions the latter provides very good distribution assuming that your routing keys are distributed
evenly enough.

And RabbitMQ will require certain "cooperation" on the application side, namely it won't coordinate a single
active consumer when there are competing ones. 3.8 will have basically that feature [3]
but it is months away from shipping.

I think your understanding of the problems involved is solid. Even though extensibility is a strong point
of RabbitMQ, I usually suggest against developing custom plugins:

 * Upgrades of the plugin will require a cluster-wide redeployment. Compared to topology changes by apps this is always a lot more involved.
 * You will have to keep track of internal API changes which are not common for exchange types (in fact, we only had one potentially breaking
   change in many years in 3.7.0) but nonetheless may be too much to ask for
 * Exchanges are the simplest plugin types there are; you can use Elixir instead of Erlang (I know I would). Nonetheless it requires
   a certain amount of expertise on your team.

So if you can avoid developing a custom exchange and simply introduce an external locking service and get away with E2E and one or two exchange
types, I'd do that, even if it seems messy on the surface.

HTH.

1. https://www.rabbitmq.com/e2e.html
2. https://github.com/rabbitmq/rabbitmq-consistent-hash-exchange
3. https://github.com/rabbitmq/rabbitmq-server/pull/1802
-----------------------------

A+
Dave
Reply to thread Export thread (mbox)