~skeeto/public-inbox

3 2

Question on "Where's all the code?"

Steadman Dillroper <dillstead@gmail.com>
Details
Message ID
<CAOJH3v4v-qLhmFxH=+fb+kcjuO-TJrDQqKKaOYJqz=0cbBpXWg@mail.gmail.com>
DKIM signature
pass
Download raw message
Hi Chris,

In the following code snippet:

for (int32_t i = 1; i < ndirs; i++) {
    for (int32_t j = dirs[i].link; j >= 0; j = dirs[j].link) {
        dirs[j].nbytes += dirs[i].nbytes;
        dirs[j].nlines += dirs[i].nlines;
    }
}

I'm having trouble understanding how this code snippet doesn't double
count.  E.g., if you have a root directory which contains dir A and
dir A contains directory B they are laid out as follows in dirs:
  root, A, B
The code walks the list from left to right starting with root.  When A
is reached, its nbytes and nlines are added to root's count.  When B
is reached, its nbytes and nlines are added to A's count which are
then added to root's count. Isn't A being double counted in this case?

I must be missing something obvious here.

Thanks,

Stedman
Details
Message ID
<20220602033117.yebu6uhtkq635imf@nullprogram.com>
In-Reply-To
<CAOJH3v4v-qLhmFxH=+fb+kcjuO-TJrDQqKKaOYJqz=0cbBpXWg@mail.gmail.com> (view parent)
DKIM signature
missing
Download raw message
The right-hand side of += is dirs[i], which, when the outer loop is on B, 
is B. This count does not change while walking the links since j is never 
equal to i. When i points to B, first B is added to A, then B (not A) is 
also added to root. If it instead went as you had described then, yes, it 
would double count. At one point I had tried counting from the end rather 
than the beginning (i.e. running the outer loop in reverse), which results 
in double counting. On deep source trees it had a dramatic effect and the 
output was obviously bogus!

I hope that clears it up, Stedman. Questions like this are welcome, so 
feel free to ask about anything else you find. Especially since it's quite 
possible I've simply made an error somewhere.
Steadman Dillroper <dillstead@gmail.com>
Details
Message ID
<CAOJH3v6-ZEMBjeFDrmmxURKOAoYBPWuih94bTFRf7t8Mvo5+-g@mail.gmail.com>
In-Reply-To
<20220602033117.yebu6uhtkq635imf@nullprogram.com> (view parent)
DKIM signature
pass
Download raw message
Ah, right I missed that small detail :-) Thanks for the reply.  When I
have some time later today I'll finish up the article.  I also plan on
getting to the lock free queue.  I always look forward to your blog
posts and perusing your code.

Best,

Stedman

On Wed, Jun 1, 2022 at 11:31 PM Christopher Wellons
<wellons@nullprogram.com> wrote:
>
> The right-hand side of += is dirs[i], which, when the outer loop is on B,
> is B. This count does not change while walking the links since j is never
> equal to i. When i points to B, first B is added to A, then B (not A) is
> also added to root. If it instead went as you had described then, yes, it
> would double count. At one point I had tried counting from the end rather
> than the beginning (i.e. running the outer loop in reverse), which results
> in double counting. On deep source trees it had a dramatic effect and the
> output was obviously bogus!
>
> I hope that clears it up, Stedman. Questions like this are welcome, so
> feel free to ask about anything else you find. Especially since it's quite
> possible I've simply made an error somewhere.
Details
Message ID
<20220606224727.mzrackkui5xbv46a@nullprogram.com>
In-Reply-To
<CAOJH3v6-ZEMBjeFDrmmxURKOAoYBPWuih94bTFRf7t8Mvo5+-g@mail.gmail.com> (view parent)
DKIM signature
missing
Download raw message
> How did you measure the performance on Windows?

Nothing fancy. I wrapped suspected portions in QueryPerformanceCounter 
calls. This included the newline-counting loop, and for Win32 calls I made 
X-prefixed wrappers that measure the total time spent in each system call 
(XCreateFile, etc.), then called through these X-wrappers. Counter totals 
were printed at the end of the program. I also ran under the "time" 
command in w64devkit as a sanity check on the totals. In the original, 
single-threaded version, the CreateFile, ReadFile, and CloseHandle sum 
matched the "time" command output so I know I didn't miss anything. The 
loop time matched the ReadFile time, so I knew it was spending all its 
time waiting on that system call, and practically none on scanning bytes.

Example wrapper (untested, just rewrote it quickly):

static uint64_t time_ReadFile;
static BOOL XReadFile(HANDLE h, void *buf, DWORD len, DWORD *p, void *o)
{
    LARGE_INTEGER a, b;
    QueryPerformanceCounter(&a);
    BOOL r = ReadFile(h, buf, len, p, o);
    QueryPerformanceCounter(&b);
    time_ReadFile += b.QuadPart - a.QuadPart;
    return r;
}

Then I print `time_ReadFile` scaled by QueryPerformanceFrequency() at the 
end of the program. I've seen others accomplish the same with RAII in C++, 
where they initialize a performance counter instance in the scope they 
want to measure, and QueryPerformanceCounter is called in the ctor+dtor.

(To compare: On Linux I use clock_gettime() with CLOCK_MONOTONIC_RAW, and 
everything else is basically the same.)

I know there are specialized profiling tools, but the ones I've tried have 
let me down often enough that I don't trust their outputs. Modern compiler 
optimization too often confuses statistical sampling, and unlike debugging 
it's not useful to turn it off. In a long term project I'd probably leave 
my custom profile scaffolding in place so that I could reuse it, and 
probably also make it thread-safe (atomically increment the counter).
Reply to thread Export thread (mbox)