~sircmpwn/public-inbox

22 2

[PULL REQUEST gddo] Support for Go Modules

Details
Message ID
<CAAXO9H7WUP7.05FRWQRU5L2W@nitro>
DKIM signature
pass
Download raw message
I've finally got support for Go modules working. In the meantime I've
also removed a dependency on Redis and moved the entire database to
PostgreSQL. I would appreciate any assistance with testing the
implementation.

To test the changes, initialize the PostgreSQL database with the schema
found in schema.sql. Then pass the database URL to gddo-server with the
--pg-server flag.

Nearly all funtionality has been kept intact with the exception of the
following TODOs.

Remaining TODOs:
- Remove all matching packages from the database upon blocking an import
  path. Should be easy to implement. Alternatively, leave this up to the
  admin to do manually.
- Implement crawling of modules in the background. Modules can be
  checked once per day. Needs a function in the database to retrieve the
  module with the oldest LastCrawl time. Should probably also crawl
  modules' imports.
- Go Subrepo index. The Go subrepo is made up of multiple modules and we
  cannot simply retrieve all of its modules with one query. This may
  require keeping an up-to-date list of all the modules in the subrepo.
  An alternative is to simply perform a search for "golang.org/x".
- Store module go-source meta tags in the database. Should we go ahead
  with this or wait for the VCS RFC to be adopted?
- Using the API search feature does not trigger a crawl. As the API is
  currently offline, this is not really an issue.
- Improve the error handling in certain cases.

Note that crawling the standard library can take lots of time, so the
first request to the documentation for a package in the standard library
may time out.

Also note that currently this will only display the latest version of a
package. Later on we can implement support for allowing the user to
choose which version of a package to display.

----------------------------------------------------------------
The following changes since commit 1227dec19e531d568c36b9fe59a1bef4c781de00:

  gddo-admin: Remove stats command (2021-03-24 08:05:13 -0400)

are available in the Git repository at:

  https://git.sr.ht/~adnano/gddo

for you to fetch changes up to 7ab21c4444247e08b9089903a2bf0d6bc8aa8e29:

  httputil: Remove AuthTransport (2021-03-30 15:54:03 -0400)

----------------------------------------------------------------
Adnan Maolood (14):
      Add internal/proxy package
      Add internal/version package
      Add internal/stdlib package
      Add internal/source package
      Add internal/doc package
      Add internal/database package
      Add database schema
      Run go mod tidy
      gddo-admin: Implement support for Go modules
      gddo-server: Implement support for Go modules
      gosrc: Remove
      database: Remove
      doc: Remove
      httputil: Remove AuthTransport
Details
Message ID
<CACHL43DLX39.1LMK3UL4X909N@taiga>
In-Reply-To
<CAAXO9H7WUP7.05FRWQRU5L2W@nitro> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
Some feedback from testing:

- The first time a package is fetched, it returns an empty search result
  instead of the package page. Attempting the search again shows the
  now-indexed package.
- I tested it with github.com/go-git/go-git and it did not show the
  latest module version. We should also definitely get mulitple versions
  supported before we ship this.

On Tue Mar 30, 2021 at 3:30 PM EDT, Adnan Maolood wrote:
> - Remove all matching packages from the database upon blocking an import
> path. Should be easy to implement. Alternatively, leave this up to the
> admin to do manually.

Would prefer to see this implemented before we ship

> - Store module go-source meta tags in the database. Should we go ahead
> with this or wait for the VCS RFC to be adopted?

Even with the VCS RFC, we will still need to provide some legacy support
at least for some time.
Details
Message ID
<CACJMRVW57ST.243OQO9DPVTQQ@galliumos>
In-Reply-To
<CACHL43DLX39.1LMK3UL4X909N@taiga> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
On Thu Apr 1, 2021 at 11:19 AM EDT, Drew DeVault wrote:
> Some feedback from testing:
>
> - The first time a package is fetched, it returns an empty search result
> instead of the package page. Attempting the search again shows the
> now-indexed package.
> - I tested it with github.com/go-git/go-git and it did not show the
> latest module version. We should also definitely get mulitple versions
> supported before we ship this.

In this case we probably need to try retrieving every major version of a
module until we run into an error. This brings up a good question:
should we consider github.com/go-git/go-git the same as
github.com/go-git/go-git/v5 or whatever the latest version is? It seems
that pkg.go.dev does not work this way: go-git and go-git/v5 are treated
as separate modules.
Details
Message ID
<CACJN4Y094O4.3SJ9MG1UUUJDB@taiga>
In-Reply-To
<CACJMRVW57ST.243OQO9DPVTQQ@galliumos> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
On Thu Apr 1, 2021 at 12:55 PM EDT, Adnan Maolood wrote:
> On Thu Apr 1, 2021 at 11:19 AM EDT, Drew DeVault wrote:
> > Some feedback from testing:
> >
> > - The first time a package is fetched, it returns an empty search result
> > instead of the package page. Attempting the search again shows the
> > now-indexed package.
> > - I tested it with github.com/go-git/go-git and it did not show the
> > latest module version. We should also definitely get mulitple versions
> > supported before we ship this.
>
> In this case we probably need to try retrieving every major version of a
> module until we run into an error. This brings up a good question:
> should we consider github.com/go-git/go-git the same as
> github.com/go-git/go-git/v5 or whatever the latest version is? It seems
> that pkg.go.dev does not work this way: go-git and go-git/v5 are treated
> as separate modules.

Hm, we should probably base our behavior on what Go itself does when you
try to import each version.
Details
Message ID
<CACJQBMTEFQY.3EN6J5JSEDQGW@galliumos>
In-Reply-To
<CACJN4Y094O4.3SJ9MG1UUUJDB@taiga> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
On Thu Apr 1, 2021 at 12:56 PM EDT, Drew DeVault wrote:
> Hm, we should probably base our behavior on what Go itself does when you
> try to import each version.

That would mean treating each version as a separate module, so the
current behavior is correct. Though in the future we may want to fetch
later versions and allow the user to switch between them.
Details
Message ID
<CACJRJD01GJ0.B1EU2QYXBMJQ@taiga>
In-Reply-To
<CACJQBMTEFQY.3EN6J5JSEDQGW@galliumos> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
On Thu Apr 1, 2021 at 1:00 PM EDT, Adnan Maolood wrote:
> On Thu Apr 1, 2021 at 12:56 PM EDT, Drew DeVault wrote:
> > Hm, we should probably base our behavior on what Go itself does when you
> > try to import each version.
>
> That would mean treating each version as a separate module, so the
> current behavior is correct. Though in the future we may want to fetch
> later versions and allow the user to switch between them.

Well, there may be some happy medium. Is it possible to enumerate the
versions, given any of them as an input?
Details
Message ID
<CACJXD4XEOE7.3S5CLT8ADLM01@galliumos>
In-Reply-To
<CACJRJD01GJ0.B1EU2QYXBMJQ@taiga> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
On Thu Apr 1, 2021 at 1:01 PM EDT, Drew DeVault wrote:
> On Thu Apr 1, 2021 at 1:00 PM EDT, Adnan Maolood wrote:
> > On Thu Apr 1, 2021 at 12:56 PM EDT, Drew DeVault wrote:
> > > Hm, we should probably base our behavior on what Go itself does when you
> > > try to import each version.
> >
> > That would mean treating each version as a separate module, so the
> > current behavior is correct. Though in the future we may want to fetch
> > later versions and allow the user to switch between them.
>
> Well, there may be some happy medium. Is it possible to enumerate the
> versions, given any of them as an input?

Unfortunately, not really. The Go module proxy will only list versions
for the current major version (i.e. requesting v2 will return v2.*.*, v3
will return v3.*.*, etc). The only way to enumerate the versions is to
try fetching every major version and stop when the module proxy returns
an error. We could probably store this information in the database,
though.
Details
Message ID
<CACJXVB6WYXK.2RHN74OY0JFAI@taiga>
In-Reply-To
<CACJXD4XEOE7.3S5CLT8ADLM01@galliumos> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
On Thu Apr 1, 2021 at 1:09 PM EDT, Adnan Maolood wrote:
> Unfortunately, not really. The Go module proxy will only list versions
> for the current major version (i.e. requesting v2 will return v2.*.*, v3
> will return v3.*.*, etc). The only way to enumerate the versions is to
> try fetching every major version and stop when the module proxy returns
> an error. We could probably store this information in the database,
> though.

Yeah, fetching like that seems gross to me. Storing every known version
in the database and enumerating the ones we know is a better idea.
Details
Message ID
<CAE7T7P9OPLS.1XTASOU984XJF@yoga>
In-Reply-To
<CACHL43DLX39.1LMK3UL4X909N@taiga> (view parent)
DKIM signature
pass
Download raw message
On Thu Apr 1, 2021 at 11:19 AM EDT, Drew DeVault wrote:
> Some feedback from testing:
>
> - The first time a package is fetched, it returns an empty search result
> instead of the package page. Attempting the search again shows the
> now-indexed package.

It seems like this happens when fetching the module times out. The logs
will report "Serving package ... as not found after timeout getting
doc". Perhaps the default timeout should be increased? Or perhaps we
should display an error message along the lines of: "This module is
taking a while to fetch, please check back later".

> On Tue Mar 30, 2021 at 3:30 PM EDT, Adnan Maolood wrote:
> > - Remove all matching packages from the database upon blocking an import
> > path. Should be easy to implement. Alternatively, leave this up to the
> > admin to do manually.
>
> Would prefer to see this implemented before we ship

I've now implemented this.

> > - Store module go-source meta tags in the database. Should we go ahead
> > with this or wait for the VCS RFC to be adopted?
>
> Even with the VCS RFC, we will still need to provide some legacy support
> at least for some time.

I'm thinking that go-source metadata should be stored in its own table
in the database so that we can drop that table later on. One caveat with
go-source metadata is that it will only be valid for the latest version,
as it will always point towards the latest commit on the repository. So
I'm not sure that go-source is a good fit for Go modules, especially if
we allow multiple versions.
Details
Message ID
<CAE8IL5MILCK.12J3167NU8FO@taiga>
In-Reply-To
<CAE7T7P9OPLS.1XTASOU984XJF@yoga> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
On Sat Apr 3, 2021 at 12:05 PM EDT, Adnan Maolood wrote:
> It seems like this happens when fetching the module times out. The logs
> will report "Serving package ... as not found after timeout getting
> doc". Perhaps the default timeout should be increased? Or perhaps we
> should display an error message along the lines of: "This module is
> taking a while to fetch, please check back later".

Yeah, something like that. Or maybe we should just return quickly and
display a "This module is being fetched in the background. Feel free to
refresh while we're working on it.", with a meta tag which refreshes
automatically every 10 seconds or something.

> I'm thinking that go-source metadata should be stored in its own table
> in the database so that we can drop that table later on. One caveat with
> go-source metadata is that it will only be valid for the latest version,
> as it will always point towards the latest commit on the repository. So
> I'm not sure that go-source is a good fit for Go modules, especially if
> we allow multiple versions.

I'm okay with putting it into a separate table temporarily.
Details
Message ID
<CAE9A69KU931.1PGUV0UHJIKU9@yoga>
In-Reply-To
<CAE8IL5MILCK.12J3167NU8FO@taiga> (view parent)
DKIM signature
pass
Download raw message
On Sat Apr 3, 2021 at 12:38 PM EDT, Drew DeVault wrote:
> Yeah, something like that. Or maybe we should just return quickly and
> display a "This module is being fetched in the background. Feel free to
> refresh while we're working on it.", with a meta tag which refreshes
> automatically every 10 seconds or something.

Perhaps we should separate the action of searching for a package and
requesting to add it to the database? We can show a message on the
bottom of the search results like "Can't find what you're looking for?
You might need to [add a module to the database]". Then we can have a
dedicated page for adding modules which shows the status of fetching or
something.
Details
Message ID
<CAEAUEGJUWH5.1F2BB48E1M5WP@yoga>
In-Reply-To
<CAE9A69KU931.1PGUV0UHJIKU9@yoga> (view parent)
DKIM signature
pass
Download raw message
Another question: since there is no clean way to keep track of all
golang.org/x modules without maintaining a list of them, shall we remove
the /-/subrepo endpoint? That endpoint is currently not advertised
anywhere at the moment.
Details
Message ID
<CAEJ7203M574.1N59EI1TAGVBJ@yoga>
In-Reply-To
<CAEAUEGJUWH5.1F2BB48E1M5WP@yoga> (view parent)
DKIM signature
pass
Download raw message
I've pushed an update which stores go-source meta tags in the database
upon crawling. I've also added a message that is displayed when crawling
a package times out, and fixed a few issues related to import counts.
I've also removed the Go Subrepo index endpoint as there is no easy way
to implement it.

I believe that these changes can be accepted upstream now. I would like
to send some patches to clean up various gddo-server code before I work
on implementing the ability to choose which version of a package to
display.

Later on I would like to implement the ability to choose which GOOS to
display documentation for. Perhaps I should ensure that the database is
set up to support this before these changes are accepted, otherwise we
might have to change the database schema again.
Details
Message ID
<CAEJNDIQ6UJP.BKSBQDED2I46@yoga>
In-Reply-To
<CAEJ7203M574.1N59EI1TAGVBJ@yoga> (view parent)
DKIM signature
pass
Download raw message
On Sat Apr 3, 2021 at 9:00 PM EDT, Adnan Maolood wrote:
> I believe that these changes can be accepted upstream now. I would like
> to send some patches to clean up various gddo-server code before I work
> on implementing the ability to choose which version of a package to
> display.

To clarify: all that needs to be done is to present available versions
to the user in the frontend. The backend support is complete. We might
also want to store a list of versions for each module in the database
independent of which versions have documentation stored in the database.
Otherwise, only the versions which have documentation stored will be
shown.
Details
Message ID
<CAEY8AFVRLLR.KO19RL73PGL1@taiga>
In-Reply-To
<CAEAUEGJUWH5.1F2BB48E1M5WP@yoga> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
On Sat Apr 3, 2021 at 2:27 PM EDT, Adnan Maolood wrote:
> Another question: since there is no clean way to keep track of all
> golang.org/x modules without maintaining a list of them, shall we remove
> the /-/subrepo endpoint? That endpoint is currently not advertised
> anywhere at the moment.

Yeah this can be removed
Details
Message ID
<CAEY8NXYA8D6.2TNY05DJM9GMY@taiga>
In-Reply-To
<CAE9A69KU931.1PGUV0UHJIKU9@yoga> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
I would prefer not to separate these actions, I think it makes for a
worse user experience for a workflow which will be very common.
Details
Message ID
<CAF1U9ODUMD8.BDMZRZ8966EM@galliumos>
In-Reply-To
<CAEY8NXYA8D6.2TNY05DJM9GMY@taiga> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
I've pushed another round of bug fixes and also changed the database
schema to store the GOOS and GOARCH of the documentation so that we can
allow the user to choose the GOOS in the future. The database now stores
a list of the module's versions as well as the module's v1 path so that
we can easily retrieve all the versions present in the database.
Details
Message ID
<CAMW41IYOONR.2TUGKHE11I9C7@taiga>
In-Reply-To
<CAF1U9ODUMD8.BDMZRZ8966EM@galliumos> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
This is still on my radar, sorry for the delay. I will have it reviewed
again soon.
Details
Message ID
<CATO70PR50O1.1DRMV793H3IIJ@nitro>
In-Reply-To
<CAMW41IYOONR.2TUGKHE11I9C7@taiga> (view parent)
DKIM signature
pass
Download raw message
Any update on this?
Details
Message ID
<CAUCWESQHDPG.RDYYVL1WMPIL@taiga>
In-Reply-To
<CATO70PR50O1.1DRMV793H3IIJ@nitro> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
On Wed Apr 21, 2021 at 4:05 PM EDT, Adnan Maolood wrote:
> Any update on this?

Thanks for the ping. Just gave this a fresh review and everything looks
good to me. Nice work! I pushed the code to master, is there any reason
to wait before deploying it to production as well?
Details
Message ID
<CAUD38KY34NK.8E05DIJETJEF@nitro>
In-Reply-To
<CAUCWESQHDPG.RDYYVL1WMPIL@taiga> (view parent)
DKIM signature
pass
Download raw message
On Thu Apr 22, 2021 at 11:27 AM EDT, Drew DeVault wrote:
> Thanks for the ping. Just gave this a fresh review and everything looks
> good to me. Nice work! I pushed the code to master, is there any reason
> to wait before deploying it to production as well?

I think that it is ready to be deployed to production. Some more testing
wouldn't hurt, though.

A few notes:

One known issue is that the server currently returns "not found" for the
module golang.org/x/crypto and similar modules which only contain
submodules and no documentation. Ideally the submodules should be
displayed instead of an error. This can probably be fixed easily.

Once you deploy, make a request to https://godocs.io/std to fetch the
standard library, otherwise links to the standard library won't work.
This may take a while and will likely time out. This is an area for
future improvement.

Also note that one of my commits removed the hard-coded block of
code.jquery.com, so you might want to block it with gddo-admin.

And one final thing that may be problematic: more than one request for
the same module at the same time can result in multiple fetches for that
module. The extra fetches will fail to add the module to the database,
but otherwise they have no effect. I believe this issue predates module
support. I guess it might not be that big of a deal, but it would be
nice if there was a way we could prevent that. Maybe we could maintain
an in-memory list of modules that are currently being fetched.
Details
Message ID
<CAUDI5RU27JW.BP05PSIA90J9@nitro>
In-Reply-To
<CAUCWESQHDPG.RDYYVL1WMPIL@taiga> (view parent)
DKIM signature
pass
Download raw message
On Thu Apr 22, 2021 at 11:27 AM EDT, Drew DeVault wrote:
> Thanks for the ping. Just gave this a fresh review and everything looks
> good to me. Nice work! I pushed the code to master, is there any reason
> to wait before deploying it to production as well?

I believe that this is ready to be deployed to production.

A few notes:

The server currently returns "not found" for modules like
golang.org/x/crypto which contain only submodules and no documentation.
Ideally the submodules would be displayed instead of an error. This can
probably be fixed easily.

After deploying, make a request to https://godocs.io/std to fetch the
stdlib, otherwise links to it will not work. This will likely take a
while and the request may time out. This is an area for future
improvement.

Making more than one request for the same module at the same time can
result in redundant fetches for that module. This isn't really a big
deal, but we might be able to avoid this by maintaining an in-memory
list of currently pending fetches to avoid redundant fetches.
Details
Message ID
<CAUE4K6AU7IS.2P5Y29FRF6FES@nitro>
In-Reply-To
<CAUDI5RU27JW.BP05PSIA90J9@nitro> (view parent)
DKIM signature
pass
Download raw message
Sorry for the duplicate emails, I though the first one failed to send.
Reply to thread Export thread (mbox)