Hi there,
Following some recent discussions on the mailing list, I took the time
to integrate `unmerdify` to offpunk.
`unmerdify` https://codeberg.org/vjousse/unmerdify is a python lib using
the rules from https://github.com/fivefilters/ftr-site-config to extract
the content of a webpage (the rules are mainly used by wallabag at the
moment https://github.com/wallabag/wallabag/ ).
The idea is to use `unmerdify` if installed and if a rule for the
current URL exists. It will fallback to the current behavior of using
Readability if `unmerdify` can't find a rule for the current url.
You can find the current integration here:
https://codeberg.org/vjousse/offpunk/pulls/1
You can git clone the project locally, switch to the feat/add-unmerdify
branch and test the integration.
git clone git@codeberg.org:vjousse/offpunk.git
cd offpunk
git checkout feat/add-unmerdify
pip install -r feat/add-unmerdify #or pip install
git+ssh://git@codeberg.org/vjousse/unmerdify.git
You will then need to have a local copy of the rules available here:
https://github.com/fivefilters/ftr-site-config
After that, calling offpunk with the --ftr-site-config flag set up
should do the trick
./offpunk.py --ftr-site-config
/home/vjousse/usr/projects/unmerdify/ftr-site-config
You can try to go to https://ploum.net and check that the page is
rendered as expected and not truncated as it is the case with the
current version of offpunk.
Don't hesitate to provide feedback, I will later on provide a patch on
this mailing to integrate it on sr.ht master branch.
Regards,
Vince