~sircmpwn/sr.ht-discuss

9 3

Mercurial tag tarball download gives dfferent checksum every time

Details
Message ID
<af46b926-510e-c337-7cb0-eb12df47ac88@gmail.com>
DKIM signature
pass
Download raw message
When I download the tarball generated for Mercurial tag (for example 
https://hg.sr.ht/~scoopta/wofi/archive/v1.0.tar.gz), it has different 
checksum every time I download it. Is there something that can be done 
with this, so that the checksum would be consistent, like GitHub 
downloads for example?
Details
Message ID
<02fca37d-5108-8437-e3ba-14a306422f85@gmail.com>
In-Reply-To
<af46b926-510e-c337-7cb0-eb12df47ac88@gmail.com> (view parent)
DKIM signature
pass
Download raw message
> When I download the tarball generated for Mercurial tag (for example
> https://hg.sr.ht/~scoopta/wofi/archive/v1.0.tar.gz), it has different
> checksum every time I download it. Is there something that can be done
> with this, so that the checksum would be consistent, like GitHub
> downloads for example?
Is anyone interested in this?
Marcin Cieslak
Details
Message ID
<nycvar.OFS.7.76.44444.807.2003070251310.72225@z.fncre.vasb>
In-Reply-To
<02fca37d-5108-8437-e3ba-14a306422f85@gmail.com> (view parent)
DKIM signature
pass
Download raw message
On Sat, 7 Mar 2020, Eternal Sorrow wrote:

>> When I download the tarball generated for Mercurial tag (for example
>> https://hg.sr.ht/~scoopta/wofi/archive/v1.0.tar.gz), it has different
>> checksum every time I download it. Is there something that can be done
>> with this, so that the checksum would be consistent, like GitHub
>> downloads for example?
> Is anyone interested in this?
>

Looks like "-n" needs to be added to gzip(1):

      -n, --no-name     This option stops the filename and timestamp from being
                        stored in the output file.

radziecki> file one/dist/*
one/dist/v1.0.tar.gz: gzip compressed data, was "/tmp/v1.0b'e03aa2d8d1ce0830'.tar", last modified: Sat Dec 21 06:01:56 2019, max compression, original size modulo 2^32 184320
radziecki> file two/dist/*
two/dist/v1.0.tar.gz: gzip compressed data, was "/tmp/v1.0b'5dd8a136007d0768'.tar", last modified: Sat Dec 21 06:01:56 2019, max compression, original size modulo 2^32 184320
Details
Message ID
<867dzwe3gv.fsf@acsl.se>
In-Reply-To
<02fca37d-5108-8437-e3ba-14a306422f85@gmail.com> (view parent)
DKIM signature
pass
Download raw message
Eternal Sorrow <lynx1534@gmail.com> writes:

>> When I download the tarball generated for Mercurial tag (for example
>> https://hg.sr.ht/~scoopta/wofi/archive/v1.0.tar.gz), it has different
>> checksum every time I download it. Is there something that can be done
>> with this, so that the checksum would be consistent, like GitHub
>> downloads for example?
> Is anyone interested in this?

I don't believe hg archive guarantees the output will match between
runs, although you might get lucky.

I have a build job that performs an archive and tosses the result on S3
which I pull down from.

-- 
  Malcolm Matalka
  Abiogenesis Computer Systems Lab
Details
Message ID
<e2f87769-ca0f-2fdb-1272-a66ebb512458@gmail.com>
In-Reply-To
<867dzwe3gv.fsf@acsl.se> (view parent)
DKIM signature
pass
Download raw message
On 3/7/20 10:47 PM, Malcolm Matalka wrote:
> Eternal Sorrow <lynx1534@gmail.com> writes:
>
>>> When I download the tarball generated for Mercurial tag (for example
>>> https://hg.sr.ht/~scoopta/wofi/archive/v1.0.tar.gz), it has different
>>> checksum every time I download it. Is there something that can be done
>>> with this, so that the checksum would be consistent, like GitHub
>>> downloads for example?
>> Is anyone interested in this?
> I don't believe hg archive guarantees the output will match between
> runs, although you might get lucky.
>
> I have a build job that performs an archive and tosses the result on S3
> which I pull down from.
>
Is it absolutely necessary to use hg archive instead of just calling the 
archive tools directly?
Details
Message ID
<865zfge2je.fsf@acsl.se>
In-Reply-To
<e2f87769-ca0f-2fdb-1272-a66ebb512458@gmail.com> (view parent)
DKIM signature
pass
Download raw message
Eternal Sorrow <lynx1534@gmail.com> writes:

> On 3/7/20 10:47 PM, Malcolm Matalka wrote:
>> Eternal Sorrow <lynx1534@gmail.com> writes:
>>
>>>> When I download the tarball generated for Mercurial tag (for example
>>>> https://hg.sr.ht/~scoopta/wofi/archive/v1.0.tar.gz), it has different
>>>> checksum every time I download it. Is there something that can be done
>>>> with this, so that the checksum would be consistent, like GitHub
>>>> downloads for example?
>>> Is anyone interested in this?
>> I don't believe hg archive guarantees the output will match between
>> runs, although you might get lucky.
>>
>> I have a build job that performs an archive and tosses the result on S3
>> which I pull down from.
>>
> Is it absolutely necessary to use hg archive instead of just calling the archive
> tools directly?

Depends on what you want, but hg archive removes all the hg metadata and
historic data and just gives you a snapshot of the tracked files at that
commit.  This is generally what people want if they just want to build
from source (such as input to build engines for package managers).

-- 
  Malcolm Matalka
  Abiogenesis Computer Systems Lab
Details
Message ID
<f8d7415e-a044-a128-c174-7b3fe9881352@gmail.com>
In-Reply-To
<865zfge2je.fsf@acsl.se> (view parent)
DKIM signature
pass
Download raw message
On 3/7/20 11:07 PM, Malcolm Matalka wrote:
> Eternal Sorrow <lynx1534@gmail.com> writes:
>
>> On 3/7/20 10:47 PM, Malcolm Matalka wrote:
>>> Eternal Sorrow <lynx1534@gmail.com> writes:
>>>
>>>>> When I download the tarball generated for Mercurial tag (for example
>>>>> https://hg.sr.ht/~scoopta/wofi/archive/v1.0.tar.gz), it has different
>>>>> checksum every time I download it. Is there something that can be done
>>>>> with this, so that the checksum would be consistent, like GitHub
>>>>> downloads for example?
>>>> Is anyone interested in this?
>>> I don't believe hg archive guarantees the output will match between
>>> runs, although you might get lucky.
>>>
>>> I have a build job that performs an archive and tosses the result on S3
>>> which I pull down from.
>>>
>> Is it absolutely necessary to use hg archive instead of just calling the archive
>> tools directly?
> Depends on what you want, but hg archive removes all the hg metadata and
> historic data and just gives you a snapshot of the tracked files at that
> commit.  This is generally what people want if they just want to build
> from source (such as input to build engines for package managers).
>
But with `hg archive` we cannot control the arguments the archive tools 
are run with?
Details
Message ID
<016d645c-3388-40e3-acc2-1e8f2eadd0dc@www.fastmail.com>
In-Reply-To
<f8d7415e-a044-a128-c174-7b3fe9881352@gmail.com> (view parent)
DKIM signature
pass
Download raw message
I do not know. I also do not know if the archive tools guarantee deterministic output.

-- 
  Malcolm Matalka
  Abiogenesis Computer Systems Lab

On Sat, Mar 7, 2020, at 14:16, Eternal Sorrow wrote:
> 
> On 3/7/20 11:07 PM, Malcolm Matalka wrote:
> > Eternal Sorrow <lynx1534@gmail.com> writes:
> >
> >> On 3/7/20 10:47 PM, Malcolm Matalka wrote:
> >>> Eternal Sorrow <lynx1534@gmail.com> writes:
> >>>
> >>>>> When I download the tarball generated for Mercurial tag (for example
> >>>>> https://hg.sr.ht/~scoopta/wofi/archive/v1.0.tar.gz), it has different
> >>>>> checksum every time I download it. Is there something that can be done
> >>>>> with this, so that the checksum would be consistent, like GitHub
> >>>>> downloads for example?
> >>>> Is anyone interested in this?
> >>> I don't believe hg archive guarantees the output will match between
> >>> runs, although you might get lucky.
> >>>
> >>> I have a build job that performs an archive and tosses the result on S3
> >>> which I pull down from.
> >>>
> >> Is it absolutely necessary to use hg archive instead of just calling the archive
> >> tools directly?
> > Depends on what you want, but hg archive removes all the hg metadata and
> > historic data and just gives you a snapshot of the tracked files at that
> > commit.  This is generally what people want if they just want to build
> > from source (such as input to build engines for package managers).
> >
> But with `hg archive` we cannot control the arguments the archive tools 
> are run with?
> 
>
Marcin Cieslak
Details
Message ID
<nycvar.OFS.7.76.44444.807.2003071328390.72225@z.fncre.vasb>
In-Reply-To
<016d645c-3388-40e3-acc2-1e8f2eadd0dc@www.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
On Sat, 7 Mar 2020, Malcolm Matalka wrote:

> I do not know. I also do not know if the archive tools guarantee deterministic output.

It is not quite hg's fault... in BSD tar setting TAR_WRITER_OPTIONS=gzip:!timestamp helps:

> env TAR_WRITER_OPTIONS=gzip:!timestamp tar acf p1.tar.gz file.pdf

...

> env TAR_WRITER_OPTIONS=gzip:!timestamp tar acf p.tar.gz file.pdf

> md5 p*.tar.gz
MD5 (p.tar.gz) = 7f101421bafb9e89c1490b90703b00cf
MD5 (p1.tar.gz) = 7f101421bafb9e89c1490b90703b00cf

Can't find right now how to do this for GNU tar.

Marcin
Marcin Cieslak
Details
Message ID
<nycvar.OFS.7.76.44444.807.2003071628150.72225@z.fncre.vasb>
In-Reply-To
<nycvar.OFS.7.76.44444.807.2003071328390.72225@z.fncre.vasb> (view parent)
DKIM signature
pass
Download raw message
On Sat, 7 Mar 2020, Marcin Cieslak wrote:

> On Sat, 7 Mar 2020, Malcolm Matalka wrote:
>
>> I do not know. I also do not know if the archive tools guarantee 
>> deterministic output.
>
> It is not quite hg's fault... in BSD tar setting 
> TAR_WRITER_OPTIONS=gzip:!timestamp helps:

please excuse my babble - hg is not using commandline tar or gzip at all

The problem seems to be here:

https://hg.sr.ht/~sircmpwn/hg.sr.ht/browse/a8754d389234a312fe624468700039b497a871df/hgsrht/blueprints/repo.py#L690

          path = f"/tmp/{rev_escaped}{binascii.hexlify(os.urandom(8))}.tar.gz"

It is required to specify destination archive path in the Mercurial command line
and python-hglib client library.
a
However, Mercurial itself is smarter than this - if mercurial code
is invoked directly it can pipe archive to the stream and no file name
will be put in the gzip file.

If no timestamp is specified in the low level archive command the last commit
timestamp will be used, and it seems to work:

one/dist/v1.0.tar.gz: gzip compressed data, was "/tmp/v1.0b'e03aa2d8d1ce0830'.tar", last modified: Sat Dec 21 06:01:56 2019, max compression, original size modulo 2^32 184320
two/dist/v1.0.tar.gz: gzip compressed data, was "/tmp/v1.0b'5dd8a136007d0768'.tar", last modified: Sat Dec 21 06:01:56 2019, max compression, original size modulo 2^32 184320

Sat Dec 21 06:01:56 2019 is the timestamp of the 1.0 release.

The e03aa2d8d1ce0830 value is provided by hg.sr.ht (please note additional - probably unwanted - quotes).

The manual says the destination can be a "format string", maybe just a simple "%H" (full commit hash) or
"%b-%H" would work and produce deterministic results?

in the hg.sr.ht repo:

radziecki> hg archive -r 0.25.1 -t tgz "/tmp/%H.tgz"
radziecki> ls -l /tmp/*.tgz
-rw-r--r--   1 saper    wheel       43861 mar  7 17:19 /tmp/a4b1258437f98cfc837d373fddacf9332e9b3089.tgz
radziecki> hg log -r 0.25.1
changeset:   226:a4b1258437f9
tag:         0.25.1
user:        Peter Sanchez <peter@netlandish.com>
date:        Sun Feb 16 12:28:31 2020 -0800
summary:     Adding updated link_prefix and blob_prefix for relative url support on summary pages

This enable of course another can of worms including some security issues:
- two exports overwriting each other - but it may also enable some form of artifact
caching, with 304 Not Modified response, E-Tag or Range headers saving bandwith.

Marcin
Reply to thread Export thread (mbox)