Being probably the heaviest user of Offpunk, my cache has now grown to
40Go (probably because having experimented with deeper level of download
during the 2.0 development).
Besides of that, I’m really happy with the cache simple design: the
cache can be completely backuped with a simple "cp -pr" and, if you want
to merge two caches, you could simply do "cp -pru". According to my
tests, it works!
Nevertheless, I think it is time to add a "trim cache" feature to
Offpunk.
## A. Straightforward trimming
My first idea was to do the following:
1. Browse every lists, including history and archives and make a list of
every link and every link in them : $TOKEEP_LIST
2. Now, go into the cache and delete everything which is older than
$CACHE_TTL and which is not in $TOKEEP_LIST
Another possibility is to trim older items as long as it is bigger than
$MAX_SIZE, ignoring $CACHE_TTL
But, I have another idea:
## B. Separated caches
I was wondering about having two caches. One in .cache ($CACHE1), the
other in .local/share ($CACHE2).
As soon as you add an item to a list, its cache is duplicated from
.cache to .local/share. When browsing, offpunk always check both and if
an item is present in both, take the newest (and update the other).
The advantage of this method is that you could easily backup the part of
the cache which are important to you (even putting it on a shared disk
or something similar). Removing ~/.cache would have very few impact on
you.
Disadvantage: It will be hard to know when to remove items from $CACHE2
I’m wondering if you have any idea or any thought about the matter?
Regards,
Ploum
--
Ploum - Lionel Dricot
Blog: https://www.ploum.net
Livres: https://ploum.net/livres.html
I'm currently synchronizing my offpunk cache between computers using the
following setup:
* The ~/Sync/Netcache directory is synchronized between computers using
Syncthing.
* ~/.cache/offpunk/gemini is a symbolic link to ~/Sync/Netcache/gemini.
Same for the gopher and finger caches. The gemini/gopher/finger caches
are small enough that I don't mind synchronizing everything. I remove
some large binary files occasionally.
* The ~/.cache/offpunk/https directory contains symbolic links only for
the domains I want to synchronize. For example
~/.cache/offpunk/https/www.rfc-editor.org is a symbolic link to
~/Sync/Netcache/https/www.rfc-editor.org. Similarly for the http
cache. This way I only synchronize what I need from the larger
http/https caches.
So it's already possible to synchronize part of the cache but it's
definitely more involved than adding a URL to a list.
The double cache can make it easier to keep a separate cache of desired
domains but doesn't make it easier to prune files based on age, size or
type. It probably makes it a little more difficult for external tools to
work with the offpunk cache, for example to search through it.
At this point I favor the simpler solution, having a single cache, but I
don't have a super-strong opinion on it.
All the best,
Sotiris
On 24 jun 08 10:17, Sotiris Papatheodorou wrote:
>I'm currently synchronizing my offpunk cache between computers using the>following setup:>>* The ~/Sync/Netcache directory is synchronized between computers using> Syncthing.>* ~/.cache/offpunk/gemini is a symbolic link to ~/Sync/Netcache/gemini.> Same for the gopher and finger caches. The gemini/gopher/finger caches> are small enough that I don't mind synchronizing everything. I remove> some large binary files occasionally.>* The ~/.cache/offpunk/https directory contains symbolic links only for> the domains I want to synchronize. For example> ~/.cache/offpunk/https/www.rfc-editor.org is a symbolic link to> ~/Sync/Netcache/https/www.rfc-editor.org. Similarly for the http> cache. This way I only synchronize what I need from the larger> http/https caches.>>So it's already possible to synchronize part of the cache but it's>definitely more involved than adding a URL to a list.>>The double cache can make it easier to keep a separate cache of desired>domains but doesn't make it easier to prune files based on age, size or>type. It probably makes it a little more difficult for external tools to>work with the offpunk cache, for example to search through it.>>At this point I favor the simpler solution, having a single cache, but I>don't have a super-strong opinion on it.
Thanks a lot for that usecase. Very interesting. I tend to agree with
you: having two separate caches will probably be a recipe for problems.
Offering a way to trim the cache based on "last-seen but not in any
list" seems the best and more intuitive way to go forward.
But it doesn’t help much your own usecase and, TBH, I’m not sure how it
could be helped. I need to think a bit about the problem.
Any idea is welcome !
--
Ploum - Lionel Dricot
Blog: https://www.ploum.net
Livres: https://ploum.net/livres.html
On 2024-06-09, Ploum wrote:
>Offering a way to trim the cache based on "last-seen but not in any>list" seems the best and more intuitive way to go forward.>>But it doesn’t help much your own usecase and, TBH, I’m not sure how it>could be helped. I need to think a bit about the problem.
I imagine the cache trimming would be something initiated manually by
the user, in which case I could just ignore it and use a different
trimming strategy.
There's quite a few parameters the cache trimming can be based on:
* File age.
* File size.
* File media type.
* Domain name.
* URL pattern.
and probably more, plus combinations of the above. A fully-featured trim
command starts looking like a variant of the Unix find command. But "old
but not in any list" is simple and might be enough for most use cases.
All the best,
Sotiris