~lioploum/offpunk-users

1

Fwd: broken urls / redirects [wikipedia]

José Manuel Castroagudín Silva <chavescesures@gmail.com>
Details
Message ID
<CAJjuAbh1k7NEfvh6iHwAb1ZZE7fxPHB2owZW4VF3DnaTX4D8iw@mail.gmail.com>
Sender timestamp
1740775831
DKIM signature
pass
Download raw message
Resending this to the mailing list for completion's sake and because
I'm an idiot that doesn't check the adrress I'm "reply"ing in gmail

---------- Forwarded message ---------
De: José Manuel Castroagudín Silva <chavescesures@gmail.com>
Date: ven., 28 de feb. de 2025 ás 20:48
Subject: Re: broken urls / redirects [wikipedia]
To: Ploum <sourcehut24@ploum.eu>


O ven., 28 de feb. de 2025 ás 17:46, Ploum (<sourcehut24@ploum.eu>) escribiu:
>
> Le 25 fév 28 02:26, Lionel Dricot - Ploum a écrit :
> >Le 25 fév 28 12:43, José Manuel Castroagudín Silva a écrit :
> >>I've found a possible replacement for wikipedia:
> >>gemini://gemi.dev/cgi-bin/wp.cgi/
> >>but for some reason the search results are interpreted as
> >>"message/news" by offpunk (or maybe the server is returning a wrong
> >>mime type? but the response seems to start with "20
> >>text/gemini;lang=XX", checked in a different client )
> >
> >The bug appears because of several factors:
> >
> >1. Due to the URL construction, there’s no extension to the file.
> >
> >2. As there’s no extension, ansicat relies on the Unix utility "file"
> >to guess the format. "file" doesn’t know about "gemtext" but Ansicat
> >is smart enough to consider that any kind of text should, by default,
> >be gemtext is on the gemini protocol.
> >
> >3. For whatever reason, "file" sometimes consider some text files as
> >"message/news". Note that this doesn’t happen for all the query on
> >gemi.dev. So it seems to really depends on the content.
> >
> >4. For ansicat, "message/news" is not a "text/*" format, it’s kind of
> >a binary one. So ansicat doesn’t try to open it and consider that
> >"file" was correct.
>
> By going through file sourcecode, I’ve identified that any file which
> starts with "Article" is identified as a message/news item.

huh, that's interesting/weird :D

I wonder if, when dealing with gemini:// documents, offpunk could rely
on the server's response. Of course, this will have to rely on the
server including the correct "content-type". Then again, thet's how
gemini (should) work(s): the file type should be included in the
response.
Maybe turns out there's no simple magic trick :D

For the time being, I've applied that one line "patch" in my local
ansicat.py. I might configure the 'wikipedia' command as well

I'm going to quote your comment from a different email:
> For wikipedia, instead of relying on an unreliable gemini server,
> offpunk should simply interpret the HTML itself.

This could actually be a good idea. Maybe a "redirect" to their
"mobile view" could help, since it offers a simpler layout.

>
> One of the fix would already be for gemi.dev to return a gmi file
> starting with a "# Title". I guess that would already help a lot.
>
>


--
Saúde,

J. M. Castroagudín


-- 
Saúde,

J. M. Castroagudín
Details
Message ID
<174077400961.7.11611877457725837763.618926243@ploum.eu>
In-Reply-To
<CAJjuAbh1k7NEfvh6iHwAb1ZZE7fxPHB2owZW4VF3DnaTX4D8iw@mail.gmail.com> (view parent)
Sender timestamp
1740774002
DKIM signature
pass
Download raw message
Le 25 fév 28 08:50, José Manuel Castroagudín Silva a écrit :
>>
>> By going through file sourcecode, I’ve identified that any file which
>> starts with "Article" is identified as a message/news item.
>
>huh, that's interesting/weird :D

I’ve reported the problem to the file mailing-list. It definitely like 
an overzealous behaviour. But file is a 30 years old software used 
everywhere and I think you cannot change its behaviour just like that 
;-)

EDIT: the maintainer just told me that he fixed the bug! 
Open-source is awesome ;-)
>
>I wonder if, when dealing with gemini:// documents, offpunk could rely
>on the server's response. Of course, this will have to rely on the
>server including the correct "content-type". Then again, thet's how
>gemini (should) work(s): the file type should be included in the
>response.
>Maybe turns out there's no simple magic trick :D


The reason why Offpunk is not dealing with the server response is 
because you don’t have the server response while being offline. This 
response would need to be saved somewhere and kept in sync with the 
actual cached ressource.

This is something I wanted to avoid by design: the cache should be 
simple files, nothing else. When I started offpunk, I wondered if it 
could work.

Turns out it work in 99.9% of the case. You hit one of the 0.01%. But 
even that case will be solved by the next release ;-)

That’s one of the reason I believe Gemini missed a huge opportunity to 
not returns mimetype but to enforce .gmi file extension and relying on 
the extension to know the type. That would also allows client to 
anticipate the kind of file just by looking at the URL.

It seems I’m not alone thinking that:

gemini://blekksprut.net/loopback/reflections.gmi
>
>For the time being, I've applied that one line "patch" in my local
>ansicat.py. I might configure the 'wikipedia' command as well
>
>I'm going to quote your comment from a different email:
>> For wikipedia, instead of relying on an unreliable gemini server,
>> offpunk should simply interpret the HTML itself.
>
>This could actually be a good idea. Maybe a "redirect" to their
>"mobile view" could help, since it offers a simpler layout.
>
That’s a very cool idea worth investigating.
Reply to thread Export thread (mbox)