Hi there,
Following some recent discussions on the mailing list, I took the time
to integrate `unmerdify` to offpunk.
`unmerdify` https://codeberg.org/vjousse/unmerdify is a python lib using
the rules from https://github.com/fivefilters/ftr-site-config to extract
the content of a webpage (the rules are mainly used by wallabag at the
moment https://github.com/wallabag/wallabag/ ).
The idea is to use `unmerdify` if installed and if a rule for the
current URL exists. It will fallback to the current behavior of using
Readability if `unmerdify` can't find a rule for the current url.
You can find the current integration here:
https://codeberg.org/vjousse/offpunk/pulls/1
You can git clone the project locally, switch to the feat/add-unmerdify
branch and test the integration.
git clone git@codeberg.org:vjousse/offpunk.git
cd offpunk
git checkout feat/add-unmerdify
pip install -r feat/add-unmerdify #or pip install
git+ssh://git@codeberg.org/vjousse/unmerdify.git
You will then need to have a local copy of the rules available here:
https://github.com/fivefilters/ftr-site-config
After that, calling offpunk with the --ftr-site-config flag set up
should do the trick
./offpunk.py --ftr-site-config
/home/vjousse/usr/projects/unmerdify/ftr-site-config
You can try to go to https://ploum.net and check that the page is
rendered as expected and not truncated as it is the case with the
current version of offpunk.
Don't hesitate to provide feedback, I will later on provide a patch on
this mailing to integrate it on sr.ht master branch.
Regards,
Vince
Hi Vincent,
I’ve downloaded unmerdify and ftr-site-config. I’ve check your offpunk
branch but there are tons of changes which seem to not be related to
unmerdify at all.
Would you care sending a minimal patch for it? Or maybe we can work on
it together.
The idea would be to add unmerdify support as an alternative to
python-readability.
There would be no arguments at launch but simply two options in offpunk
itself: unmerdify_path and ftr_path. When both are set to valid path,
offpunk would transparently use unmerdify instead of readability.
(offpunk could also automatically chechk $PATH for unmerdify in case it
is installed through a package manager)
That would be a perfect "first" step and that could be 100% releasable
as it.
Next step would be to automatically fetch ftr config for a visited site
and caching it but we will keep that for further development once we are
sure unmerdify is working well.
For me to progress on this, could you simply send me the patch with what
you did (and without all the added cruft)?
Le 25 jan 05 04:41, Vincent Jousse a écrit :
>Hi there,>>Following some recent discussions on the mailing list, I took the time>to integrate `unmerdify` to offpunk.>>`unmerdify` https://codeberg.org/vjousse/unmerdify is a python lib using>the rules from https://github.com/fivefilters/ftr-site-config to extract>the content of a webpage (the rules are mainly used by wallabag at the>moment https://github.com/wallabag/wallabag/ ).>>The idea is to use `unmerdify` if installed and if a rule for the>current URL exists. It will fallback to the current behavior of using>Readability if `unmerdify` can't find a rule for the current url.>>You can find the current integration here:>https://codeberg.org/vjousse/offpunk/pulls/1>>You can git clone the project locally, switch to the feat/add-unmerdify>branch and test the integration.>>git clone git@codeberg.org:vjousse/offpunk.git>cd offpunk>git checkout feat/add-unmerdify>pip install -r feat/add-unmerdify #or pip install>git+ssh://git@codeberg.org/vjousse/unmerdify.git>>You will then need to have a local copy of the rules available here:>https://github.com/fivefilters/ftr-site-config>>After that, calling offpunk with the --ftr-site-config flag set up>should do the trick>>./offpunk.py --ftr-site-config>/home/vjousse/usr/projects/unmerdify/ftr-site-config>>You can try to go to https://ploum.net and check that the page is>rendered as expected and not truncated as it is the case with the>current version of offpunk.>>Don't hesitate to provide feedback, I will later on provide a patch on>this mailing to integrate it on sr.ht master branch.>>Regards,>Vince>
--
Ploum - Lionel Dricot
Blog: https://www.ploum.net
Bikepunk: https://bikepunk.fr/
Hi,
I’ll have a look, the changes are due to automatic linting by ruff, I’ll
have to find a way to deactivate it for offpunk I guess ^^
I’ll keep you posted.
On 1/27/25 16:09, Ploum wrote:
> Hi Vincent,> > I’ve downloaded unmerdify and ftr-site-config. I’ve check your offpunk> branch but there are tons of changes which seem to not be related to> unmerdify at all.> > Would you care sending a minimal patch for it? Or maybe we can work on> it together.> > The idea would be to add unmerdify support as an alternative to> python-readability.> > There would be no arguments at launch but simply two options in offpunk> itself: unmerdify_path and ftr_path. When both are set to valid path,> offpunk would transparently use unmerdify instead of readability.> > (offpunk could also automatically chechk $PATH for unmerdify in case it> is installed through a package manager)> > > That would be a perfect "first" step and that could be 100% releasable> as it.> > Next step would be to automatically fetch ftr config for a visited site> and caching it but we will keep that for further development once we are> sure unmerdify is working well.> > > For me to progress on this, could you simply send me the patch with what> you did (and without all the added cruft)?> > > > Le 25 jan 05 04:41, Vincent Jousse a écrit :>> Hi there,>>>> Following some recent discussions on the mailing list, I took the time>> to integrate `unmerdify` to offpunk.>>>> `unmerdify` https://codeberg.org/vjousse/unmerdify is a python lib using>> the rules from https://github.com/fivefilters/ftr-site-config to extract>> the content of a webpage (the rules are mainly used by wallabag at the>> moment https://github.com/wallabag/wallabag/ ).>>>> The idea is to use `unmerdify` if installed and if a rule for the>> current URL exists. It will fallback to the current behavior of using>> Readability if `unmerdify` can't find a rule for the current url.>>>> You can find the current integration here:>> https://codeberg.org/vjousse/offpunk/pulls/1>>>> You can git clone the project locally, switch to the feat/add-unmerdify>> branch and test the integration.>>>> git clone git@codeberg.org:vjousse/offpunk.git>> cd offpunk>> git checkout feat/add-unmerdify>> pip install -r feat/add-unmerdify #or pip install>> git+ssh://git@codeberg.org/vjousse/unmerdify.git>>>> You will then need to have a local copy of the rules available here:>> https://github.com/fivefilters/ftr-site-config>>>> After that, calling offpunk with the --ftr-site-config flag set up>> should do the trick>>>> ./offpunk.py --ftr-site-config>> /home/vjousse/usr/projects/unmerdify/ftr-site-config>>>> You can try to go to https://ploum.net and check that the page is>> rendered as expected and not truncated as it is the case with the>> current version of offpunk.>>>> Don't hesitate to provide feedback, I will later on provide a patch on>> this mailing to integrate it on sr.ht master branch.>>>> Regards,>> Vince>>>
Le 25 jan 27 04:36, Vincent Jousse a écrit :
>Hi,>>I’ll have a look, the changes are due to automatic linting by ruff, I’ll>have to find a way to deactivate it for offpunk I guess ^^
I’m not against patches that does some cleaning but this should be kept
separated from functionnal patches.