I started writing this email as a response to another thread, but it'd
be better to have it separated. So, here it goes:
I'll try to write my "feelings"/"confusion", let's see if I can make
sense of my thoughts :)
If you're reading this, read to the end before trying to comment,
because it's getting a bit chaotic. Sorry.
What I have observed, and puzzles me a bit, is that offpunk seems to
download, or at least "visit" (let's call it "visit" for now. Fetch?):
- every item in my "bookmarks" list
- AND every link present in those items (to what depth? I think only
"one level deep"?)
- same as above for my "check_later" list
- every item in subscriptions (and, yeah, new links go into tour)
- every item in my "forums" list (which I have as "frozen" after I saw
this happening, trying to make it not happen)
- AND it seems it "visits" (all?) links inside those pages.
(this has the funny side effect that every time I run sync, I end
up having a new "draft" post in bbs.geminispace.org :D)
While "sync" is happenning, I see offpunk "visiting" hundreds of urls
with messages such as, and I'll try to copy paste real quick:
* * * 25 to fetch in subscribed * * *
--> [2/357] Fetch gemini://skyjake.fi/~Cosmos/reply-feed
it fetches things in youtube (which I assume are links within my
subscriptions/bookmarks/etc), and other domains that I barely have
time to see (I remember reading about a data limit, I'll have to
search and read about that again)
I understand this makes sense in the context of "working offline". But
it might be a bit overzealous for my needs.
I'm also not sure why it would make sense to "fetch" all the things in
bookmarks (for example), and then not put new things in the tour, or
any other way of letting the user know if there are changes.
or, how it also fetches my "frozen" forums list.
This way of operating kind of assumes that every time the user goes
"offline", they'll read all their bookmarks (and first-level linked
pages) to see if there were any changes?
I also am not 100% sure about what difference passing a number
parameter to "sync" makes exactly :D, but this is not as important as
"getting" how sync works.
Maybe, now that I'm writing all this, I think what I'd like/need is either:
- a list that never gets fetched/updated/anything (just "classic
bookmarks" that I might choose to visit while online. Just a list of
links)
- this should be, in my mental model, what "frozen" would be, but
items seem to get fetched everytime anyway
- and, maybe bookmarks should work this way, after being fetched
once, and then marked as "fetched" ?
- OR, a way to "sync only_this_list" (or, something like "add new
items in subscriptions to tour" only?)
Mostly, in my "mental model", it doesn't make sense to fetch something
I called "check later", or "frozen" (and every link inside) every time
I sync.
It feels a bit like a waste of time/network . Maybe a command to
"fetch only new things in subscriptions" could be useful? it seems
like it would be time/network/cpu/disc efficient, at least
Anyhow, sorry for the disordered list of thoughts :D
Regards
--
Saúde,
J. M. Castroagudín
Le 25 mar 13 02:04, José Manuel Castroagudín Silva a écrit :
>I started writing this email as a response to another thread, but it'd>be better to have it separated. So, here it goes:>>I'll try to write my "feelings"/"confusion", let's see if I can make>sense of my thoughts :)
Hi José,
Thanks for the email. You are right, it worth clarifying how sync
operate.
So, here’s the theory with the rationale, for every sync with a cache
validity of X seconds. Let’s say that X is 43200 (which is 12 hours).
Command is : offpunk --sync --cache-validity 43200
When you do a sync, offpunk will go in each list and, for each link in a
list, will do the following:
1. normal lists: if cached version is older than 43200, fetch a new
version. This is particularly useful for pages you want to manually
check but for which you don’t want to subscribe (it can be a status page
but I use it for Gemini Antenna and several RSS feeds for which I don’t
want to subscribe to every single post)
2. subscribed lists : if cached version is older than 43200, fetch a new
version. In that new version, check if there are any links for which we
don’t have a cache yet. Download those links and put them in the tour.
3. frozen lists: don’t check nor fetch any version.
Now, and here is the catch: it is assumed that, whatever the status of
the list, you may want to read those page offline. So, at each sync,
offpunk read every single page and download the target of every single
link.
This allows to: 1. have images on the page and, 2. allows you to follow
a link while offline. That’s why even frozen lists content are parsed.
The default depth is 1. More is crazy but you can launch a shallow sync
with:
offpunk --sync --depht 0
>What I have observed, and puzzles me a bit, is that offpunk seems to>download, or at least "visit" (let's call it "visit" for now. Fetch?):>- every item in my "bookmarks" list> - AND every link present in those items (to what depth? I think only>"one level deep"?)>- same as above for my "check_later" list>- every item in subscriptions (and, yeah, new links go into tour)>- every item in my "forums" list (which I have as "frozen" after I saw>this happening, trying to make it not happen)> - AND it seems it "visits" (all?) links inside those pages.> (this has the funny side effect that every time I run sync, I end>up having a new "draft" post in bbs.geminispace.org :D)
I hope my explanation above is enough. The BBS draft is indeed a strange
bug. I’m not using that myself, that may worth investigating.
>>While "sync" is happenning, I see offpunk "visiting" hundreds of urls>with messages such as, and I'll try to copy paste real quick:> * * * 25 to fetch in subscribed * * *> --> [2/357] Fetch gemini://skyjake.fi/~Cosmos/reply-feed>it fetches things in youtube (which I assume are links within my>subscriptions/bookmarks/etc), and other domains that I barely have>time to see (I remember reading about a data limit, I'll have to>search and read about that again)
This is perfectly normal. Youtube urls are very visible because, by
default, youtube is redirected to yewtu.be but yewtu.be has been down.
So there’s no cached version and offpunk tries to make one at each sync.
In current HEAD, I’ve blocked completely youtube.com to save those
uneeded tentative connections. I hope to have one day some integration
with yt-dlp with it is very low on my priority list.
So, to be clear: no bandwith is actually wasted. Time is wasted because
offpunk tries to connect to yewtu.be at each sync. But it is not a very
big deal.
>>I understand this makes sense in the context of "working offline". But>it might be a bit overzealous for my needs.>I'm also not sure why it would make sense to "fetch" all the things in>bookmarks (for example), and then not put new things in the tour, or>any other way of letting the user know if there are changes.>or, how it also fetches my "frozen" forums list.
I hope my explanation above is clear enough in that regard.
>>This way of operating kind of assumes that every time the user goes>"offline", they'll read all their bookmarks (and first-level linked>pages) to see if there were any changes?
Yes. Because that’s the goal.
>>I also am not 100% sure about what difference passing a number>parameter to "sync" makes exactly :D, but this is not as important as>"getting" how sync works.
This is a very important parameter I explained above. This is the
cache-validity parameter. If you set it too low, you redownload
everything all the time.
>>Maybe, now that I'm writing all this, I think what I'd like/need is either:>- a list that never gets fetched/updated/anything (just "classic>bookmarks" that I might choose to visit while online. Just a list of>links)
I’m currently working on this with the concept of "notes", which would
be like lists but never fetched, never modified by "archives".
> - this should be, in my mental model, what "frozen" would be, but>items seem to get fetched everytime anyway
As explained, frozen items are not touched. They are scanned so the
target of their links can be downloaded.
My "toread" list is frozen (because I don’t want stuff to be modified in
case an article I marked as toread is deleted) but I want to see images
and be able to follow links in those toread items.
> - and, maybe bookmarks should work this way, after being fetched>once, and then marked as "fetched" ?>- OR, a way to "sync only_this_list" (or, something like "add new>items in subscriptions to tour" only?)
It is also possible to only sync some lists to make a shorter sync.
Every morning, I do a full sync with:
offpunk --sync --assume-yes --cache-validity 51840
This may takes between 3 and 10 minutes every morning (I’ve a lot of RSS
feeds).
But, during the day, I may want to do a quick sync. Then I do the
following:
offpunk --sync tour to_fetch --assume-yes --cache-validity 51840
This means that only the lists "tour" and "to_fetch" will be synced. I
colud probably put a shorter cache-validity for those.
Those two commands are in a bash script (but you can do aliases) so
I can run them easily.
>>Mostly, in my "mental model", it doesn't make sense to fetch something>I called "check later", or "frozen" (and every link inside) every time>I sync.>It feels a bit like a waste of time/network . Maybe a command to>"fetch only new things in subscriptions" could be useful? it seems>like it would be time/network/cpu/disc efficient, at least
I believe the main issue is that you missed the "cache-validity".
Offpunk knows only one thing: the date at which it fetched the latest
version of an URL.
I hope this email helps you better understand.
But don’t worry too much about the network/cpu/disc waste. I’ve learned
by writing offpunk that we, humans, have a very bad intuition about what
is wasted or not. Your brain saw the "youtube.com" link and though "what
a waste" while the reason you saw it is because nothing was downloaded
at all.
Just run "du -sh ~/.cache/offpunk/" and you will see everything offunk
has downloaded. It is probably surprizingly small if you are not a
hardcore user like me.
Also, if you visit the same websites regularly, most ressources will be
downloaded only once then kept in the cache nearly forever (as long as
you don’t ask for a reload of that particular ressource).
--
Ploum - Lionel Dricot
Blog: https://www.ploum.net
Newsletters: https://listes.ploum.net/mailman3/lists/
Bikepunk: https://bikepunk.fr/
O xov., 13 de mar. de 2025 ás 17:17, Ploum (<sourcehut24@ploum.eu>) escribiu:
> > (this has the funny side effect that every time I run sync, I end> >up having a new "draft" post in bbs.geminispace.org :D)>> I hope my explanation above is enough. The BBS draft is indeed a strange> bug. I’m not using that myself, that may worth investigating.
This is caused because the page has a link that is "create a new
draft" and it just creates an empty draft as soon as you click, taking
you to another page to add text and links and things.Not important, I
just delete them every now and then. Don't think about it.
>> I’m currently working on this with the concept of "notes", which would> be like lists but never fetched, never modified by "archives".
ah, ok, I'll keep an eye for that. This might be the "Zettelkasten"
thing i read in the devel list (never heard that word before, to ge
honest)
>> offpunk --sync tour to_fetch --assume-yes --cache-validity 51840
I wasn't aware you can sync certain lists only. Good to know. But with
everythiong you wrote, I might not even need it
> This means that only the lists "tour" and "to_fetch" will be synced. I> colud probably put a shorter cache-validity for those.>> I hope this email helps you better understand.
Thanks! this has been very clarifying, thanks for taking the time to write it
After reading it, I think I could say that my "problem"/"confusion"
was that I was not "thinking offline enough"
For example: it's not that, every time I "sync" I *will* read all my
bookmarks. The key is that every time I "go online and sync", I *want
to* have an offline (recent) version of all my bookmarks available for
later, in case I want to read them all before I want (or can, in a
different setting) go online again. This is a great idea that, since I
use offpunk mostly online, I was not really fully grasping. For my
"usage pattern", sync was simply translating to "get the new things",
and that's why fetching everything every time didn't make sense.
And, if you allow me, a couple notes/questions:
- So, if I run "sync", without seconds, it won't check the date of the
local cached data, it'll download only "resources/links that don't
exist yet locally" <- is that correct?
- If I pass a "seconds" param, it will also "renew" (fetch a new
version) anything local that's older that "seconds"
(for some reason, in my mind this meant "download just the 'activity'
that has happened since 'seconds', which doesn't really make sense...)
So, if in the future I move to a charming off-grid cabin in the woods,
and come to the library in town once a week, I should connect to the
internet and:
- open offpunk
- type "online"
- type "sync 604800" ## (60*60*24*7), seconds in a week
- type "offline"
so I get an updated version of my lists, to read during the week, plus
all the new stuff. Does this make sense?
>> Just run "du -sh ~/.cache/offpunk/" and you will see everything offunk> has downloaded. It is probably surprizingly small if you are not a> hardcore user like me.
I tried this, mostly out of curiosity:
~/.cache$ du -sh offpunk/
1002M offpunk/
(the biggest offender is https, it turns out :D )
and then I checked, with "stat", when it was created (I guess that's
the most "official" date for when I started using offpunk:
~/.cache$ stat offpunk/
[...]
Birth: 2025-01-15 11:06:28.955938453 +0100
Almost 1gb in a couple months. I have no idea if this is a lot, or
not. I wonder how it compares to other users :)
>> Also, if you visit the same websites regularly, most ressources will be> downloaded only once then kept in the cache nearly forever (as long as> you don’t ask for a reload of that particular ressource).>
Thanks again, this really helped me understand some key concepts/ideas
Regards!
>> -->> Ploum - Lionel Dricot>> Blog: https://www.ploum.net> Newsletters: https://listes.ploum.net/mailman3/lists/> Bikepunk: https://bikepunk.fr/>>
--
Saúde,
J. M. Castroagudín
Le 25 mar 13 08:07, José Manuel Castroagudín Silva a écrit :
>>>> I hope this email helps you better understand.>>Thanks! this has been very clarifying, thanks for taking the time to write it>>After reading it, I think I could say that my "problem"/"confusion">was that I was not "thinking offline enough">>For example: it's not that, every time I "sync" I *will* read all my>bookmarks. The key is that every time I "go online and sync", I *want>to* have an offline (recent) version of all my bookmarks available for>later, in case I want to read them all before I want (or can, in a>different setting) go online again. This is a great idea that, since I>use offpunk mostly online, I was not really fully grasping. For my>"usage pattern", sync was simply translating to "get the new things",>and that's why fetching everything every time didn't make sense.>>And, if you allow me, a couple notes/questions:>- So, if I run "sync", without seconds, it won't check the date of the>local cached data, it'll download only "resources/links that don't>exist yet locally" <- is that correct?
Yes. It means infinite cache-validity so any existing cache is good.
>- If I pass a "seconds" param, it will also "renew" (fetch a new>version) anything local that's older that "seconds"
Yes. Except for stuff in frozen lists. But if something is both in a
normal list and a frozen list, it will be updated when the normal list
is parsed.
>>(for some reason, in my mind this meant "download just the 'activity'>that has happened since 'seconds', which doesn't really make sense...)>>So, if in the future I move to a charming off-grid cabin in the woods,>and come to the library in town once a week, I should connect to the>internet and:>- open offpunk>- type "online">- type "sync 604800" ## (60*60*24*7), seconds in a week>- type "offline">>so I get an updated version of my lists, to read during the week, plus>all the new stuff. Does this make sense?>>
The cache-validity is not the time you will spend offline. It’s the
minimal time for which you don’t want update. Mine is around 15h because
I want a new version every day but If I did 24h, the refresh would not
happen if I’m a bit sooner one day.
Like, let’s say I launch my refresh at 1PM because I was busy in the
morning but, next mornring, I launch my refresh at 9AM. I still want new
things.
If you are in a cabin in the wood, you don’t need to ask for one week
old. Even a simple 3600 is enough because you will, anyway, be forced to
do a sync every week.
The 604800 is useful if you want to create a "virtual cabin in the wood"
where you get news once a week but, for whatever reason, you sometimes
launch a sync during that week.
That’s what happen for me during the day: sometimes, I have stuff I want
to read in my to_fetch list. So I launch a second sync during the day to
get those stuff. But I don’t want new RSS things, etc.
>>>> Just run "du -sh ~/.cache/offpunk/" and you will see everything offunk>> has downloaded. It is probably surprizingly small if you are not a>> hardcore user like me.>>I tried this, mostly out of curiosity:>~/.cache$ du -sh offpunk/>1002M offpunk/>>(the biggest offender is https, it turns out :D )>>and then I checked, with "stat", when it was created (I guess that's>the most "official" date for when I started using offpunk:>~/.cache$ stat offpunk/> [...]> Birth: 2025-01-15 11:06:28.955938453 +0100>>Almost 1gb in a couple months. I have no idea if this is a lot, or>not. I wonder how it compares to other users :)
That’s quite impressive. You use Offpunk quite a lot. But, if you
compare with a normal browser, 1Gb for nearly two months of use is not
really much. You may also have a look to see what are the biggest
offenders in /https/.
I’m still thinking about how the size of the cache can be kept to a
certain size by removing the oldest non-used part of it.
>>>>> Also, if you visit the same websites regularly, most ressources will be>> downloaded only once then kept in the cache nearly forever (as long as>> you don’t ask for a reload of that particular ressource).>>>>Thanks again, this really helped me understand some key concepts/ideas>
I understand that you need to adopt a specific mindset to use offpunk. I
hope this helps.
>Regards!>>>>> -->>>> Ploum - Lionel Dricot>>>> Blog: https://www.ploum.net>> Newsletters: https://listes.ploum.net/mailman3/lists/>> Bikepunk: https://bikepunk.fr/>>>>>>>-->Saúde,>>J. M. Castroagudín>
--
Ploum - Lionel Dricot
Blog: https://www.ploum.net
Newsletters: https://listes.ploum.net/mailman3/lists/
Bikepunk: https://bikepunk.fr/
O sáb., 15 de mar. de 2025 ás 01:12, Ploum (<sourcehut24@ploum.eu>) escribiu:
>> Le 25 mar 13 08:07, José Manuel Castroagudín Silva a écrit :> >>> >> I hope this email helps you better understand.> >> >Thanks! this has been very clarifying, thanks for taking the time to write it> >> >After reading it, I think I could say that my "problem"/"confusion"> >was that I was not "thinking offline enough"> >> >For example: it's not that, every time I "sync" I *will* read all my> >bookmarks. The key is that every time I "go online and sync", I *want> >to* have an offline (recent) version of all my bookmarks available for> >later, in case I want to read them all before I want (or can, in a> >different setting) go online again. This is a great idea that, since I> >use offpunk mostly online, I was not really fully grasping. For my> >"usage pattern", sync was simply translating to "get the new things",> >and that's why fetching everything every time didn't make sense.> >> >And, if you allow me, a couple notes/questions:> >- So, if I run "sync", without seconds, it won't check the date of the> >local cached data, it'll download only "resources/links that don't> >exist yet locally" <- is that correct?>> Yes. It means infinite cache-validity so any existing cache is good.>> >- If I pass a "seconds" param, it will also "renew" (fetch a new> >version) anything local that's older that "seconds">> Yes. Except for stuff in frozen lists. But if something is both in a> normal list and a frozen list, it will be updated when the normal list> is parsed.>> >> >(for some reason, in my mind this meant "download just the 'activity'> >that has happened since 'seconds', which doesn't really make sense...)> >> >So, if in the future I move to a charming off-grid cabin in the woods,> >and come to the library in town once a week, I should connect to the> >internet and:> >- open offpunk> >- type "online"> >- type "sync 604800" ## (60*60*24*7), seconds in a week> >- type "offline"> >> >so I get an updated version of my lists, to read during the week, plus> >all the new stuff. Does this make sense?> >> >>> The cache-validity is not the time you will spend offline. It’s the> minimal time for which you don’t want update. Mine is around 15h because> I want a new version every day but If I did 24h, the refresh would not> happen if I’m a bit sooner one day.
I truly don't understand why I'm having so much trouble wrapping my
head around the cache validity parameter, hehe.
But you are right, if I'm 'sync'ing every week, I can fetch a new
version of everything that's older than an hour, for example, and not
a week. If I used "a week", I'd risk missing updates if I do it half
an hour early next time, for example.
I think I got it now.
Maybe it would be a good idea to treat subscribed lists differently,
and check for new items every time a 'sync' is done? (Like the
to_fetch list?)
Or, is there a way to set a 'zero/minimal cache' for a list, maybe? (I
know I could do 'sync 1', but that would also update bookmarks, for
example, right?) (this is kind of the opposite to "freeze", I think
:D)
Or, would it make sense for anyone to have a command to 'check
subscriptions', basically? (And I know this is practically the
opposite of what offpunk is supposed to be great at, keeping us
offline... :D )
I'm testing adding this to my .bashrc:
alias upfeeds='offpunk --sync subscribed --assume-yes --cache-validity 1'
(I have been considering several names for this alias: get-subs ,
subpunk, and finally, I think it will stay as upfeeds. Any
suggestions? :D)
now if I run "upfeeds" ("update my feeds"), offpunk will get a (very)
updated version of my subscription list and add anything new to the
tour
Every time I finish going through my 'tour', I can run this and get
any last-minute update to read before closing offpunk. It only fetches
the subscribed list, so no need to wait for all bookmarks and other
lists to be updated.
>> Like, let’s say I launch my refresh at 1PM because I was busy in the> morning but, next mornring, I launch my refresh at 9AM. I still want new> things.>> If you are in a cabin in the wood, you don’t need to ask for one week> old. Even a simple 3600 is enough because you will, anyway, be forced to> do a sync every week.>> The 604800 is useful if you want to create a "virtual cabin in the wood"> where you get news once a week but, for whatever reason, you sometimes> launch a sync during that week.>> That’s what happen for me during the day: sometimes, I have stuff I want> to read in my to_fetch list. So I launch a second sync during the day to> get those stuff. But I don’t want new RSS things, etc.> >>> >> Just run "du -sh ~/.cache/offpunk/" and you will see everything offunk> >> has downloaded. It is probably surprizingly small if you are not a> >> hardcore user like me.> >> >I tried this, mostly out of curiosity:> >~/.cache$ du -sh offpunk/> >1002M offpunk/> >> >(the biggest offender is https, it turns out :D )> >> >and then I checked, with "stat", when it was created (I guess that's> >the most "official" date for when I started using offpunk:> >~/.cache$ stat offpunk/> > [...]> > Birth: 2025-01-15 11:06:28.955938453 +0100> >> >Almost 1gb in a couple months. I have no idea if this is a lot, or> >not. I wonder how it compares to other users :)>> That’s quite impressive. You use Offpunk quite a lot. But, if you> compare with a normal browser, 1Gb for nearly two months of use is not> really much. You may also have a look to see what are the biggest> offenders in /https/.
I did some du -sh | grep M | sort -n , the biggest one was around 50
mb from CNBC.com. 30 from CNN.com, 18 from GitHub of all places, 15
from finance.yahoo.com...
Mostly places I didn't directly visit, so I guess links in content I
actually looked at. In the end, it will add up, logically. Not to
worry, really. Kinda shows how awfully bloated websites keep
getting...
I suspect a great deal of this was downloaded when I "imported" my old
web feeds.
>> I’m still thinking about how the size of the cache can be kept to a> certain size by removing the oldest non-used part of it.>> >> >>> >> Also, if you visit the same websites regularly, most ressources will be> >> downloaded only once then kept in the cache nearly forever (as long as> >> you don’t ask for a reload of that particular ressource).> >>> >> >Thanks again, this really helped me understand some key concepts/ideas> >>> I understand that you need to adopt a specific mindset to use offpunk. I> hope this helps.
I don't think totally adopting the same workflow/mindset is needed.
Even using it with a different mindset, you have made a piece of
software that is a delight to use. Just the way it cleans up html,
alone, is wonderfully useful. And, on top of that, I get to keep
reading gemtext and discovering stuff...
But, understanding the "optimal", or "intended" use of its features,
directly from you, is great.
Thanks again for your time!
>> >Regards!> >> >>> >> --> >>> >> Ploum - Lionel Dricot> >>> >> Blog: https://www.ploum.net> >> Newsletters: https://listes.ploum.net/mailman3/lists/> >> Bikepunk: https://bikepunk.fr/> >>> >>> >> >> >--> >Saúde,> >> >J. M. Castroagudín> >>> --> Ploum - Lionel Dricot>> Blog: https://www.ploum.net> Newsletters: https://listes.ploum.net/mailman3/lists/> Bikepunk: https://bikepunk.fr/>>
--
Saúde,
J. M. Castroagudín