Hi,
I've noticed that the web interface can be really slow to bring up the
git log of individual files/directories. It's not so noticable on small
repos, but for example, on this repo/directory [1] I measured it with a
stopwatch a few times and it takes ~40 seconds for the log to appear.
Viewing the entire log for a branch is as fast as expected. As is
viewing the log for a directory where there are enough entries for it to
split it into multiple pages [2].
Is it perhaps waiting until there's a pageful of log entries or the
entire log has been traversed (whichever comes first) before displaying
any results?
- Oskari
[1]: https://git.sr.ht/~xxc3nsoredxx/portage/tree/master/item/.builds
[2]: https://git.sr.ht/~xxc3nsoredxx/portage/tree/master/item/bin
On Sat, Jul 15, 2023 at 14:04:24 +0200, Drew DeVault wrote:
> This is just a matter of this git operation being expensive. It's slow
> when you run git on localhost, too.
Fair enough.
I suppose something like GitHub is able to precompute/cache results like
that which makes it much faster. Or maybe it's javascript trickery to
make it look faster by continuously feeding data. Maybe a bit of both.
I'll take "usable in links(1)" over "bogged down by javascript" any day
though ;)
- Oskari
On Sun, Jul 16, 2023, at 00:54, Oskari Pirhonen wrote:
> On Sat, Jul 15, 2023 at 14:04:24 +0200, Drew DeVault wrote:
>> This is just a matter of this git operation being expensive. It's slow
>> when you run git on localhost, too.
>
> Fair enough.
>
> I suppose something like GitHub is able to precompute/cache results like
> that which makes it much faster. Or maybe it's javascript trickery to
> make it look faster by continuously feeding data. Maybe a bit of both.
They cache. The reason Git's slow here is that it doesn't store what files are changed in a given commit, so you have to repeatedly load a commit, check whether the file was modified by loading all intermediate trees and then comparing shas, noting the history if it's different, and then repeating for any parent commits, until the file finally isn't in the history. There are tons of no-ops in that sequence, and lots of unpredictable disk seeks and disk reads.
What e.g. Kiln used to do was just log, during push, what files were touched by what commit, and then generate the file history via database query (rather than Git). GitHub does something similar. There's definitely nuance to this (file history is head-specific, force pushes can *delete* history, etc.), but it's tractable, fairly straightforward, and worth it if you're dealing with large repos or lots of file history views (especially if you want to have quick access to blame, which otherwise is everything I just said, *plus* repeated diff checks).
I'm not volunteering to do any of that, but there are pretty big wins to that approach if someone is motivated, and it's the reason your GitHub experience is so much faster. No JavaScript trickery; just plain old caching.