~sircmpwn/sr.ht-dev

core.sr.ht: Do not inject dir=auto using BeautifulSoup v1 PROPOSED

Ersei Saggi: 1
 Do not inject dir=auto using BeautifulSoup

 1 files changed, 1 insertions(+), 19 deletions(-)
#1107223 alpine.yml success
#1107224 archlinux.yml failed
#1107225 debian.yml success
I'll go and send v2 patches for those eventually (after break). As for
the files, that's certainly going to be interesting. I think that the
files should remain as LTR? If we use dir=auto on the file display view,
then it would mess with the column limit bar.

I don't think there's an elegant way to do this, as W3C didn't envision
a way to have dir=auto propagate to every element. Running the auto
direction code would be best done client-side, but I doubt that is in
line with Sourcehut's ethos.

I'll have to identify all elements in the templates that render
user-controlled text and insert dir=auto for those. As for the files, I
think they should remain LTR for now until a solution can be found for
the column limits.

Either that, or we can preserve previous processes and pass dir=auto to
the <pre> file rendering element as well.

Happy holidays,
Ersei
Export patchset (mbox)
How do I use this?

Copy & paste the following snippet into your terminal to import this patchset into git:

curl -s https://lists.sr.ht/~sircmpwn/sr.ht-dev/patches/47341/mbox | git am -3
Learn more about email & git

[PATCH core.sr.ht] Do not inject dir=auto using BeautifulSoup Export this patch

---
Instead we will modify the templates to explicitly have dir=auto where
needed (for performance reasons).

This injection was incredibly slow when rendering large files.
 srht/flask.py | 20 +-------------------
 1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/srht/flask.py b/srht/flask.py
index e8412ce..f1f0acf 100644
--- a/srht/flask.py
+++ b/srht/flask.py
@@ -1,5 +1,4 @@
DATE_FORMAT = "%Y-%m-%dT%H:%M:%S+00:00"
from bs4 import BeautifulSoup
from flask import Flask, Response, request, url_for, render_template, redirect
from flask import Blueprint, current_app, g, abort, session as flask_session
from flask import make_response
@@ -169,23 +168,6 @@ def paginate_query(query, results_per_page=15):
        "total_results": total_results
    }

def inject_rtl_direction(resp):
    if resp.mimetype == 'text/html':
        html_doc = resp.data.decode('utf8')
        soup = BeautifulSoup(html_doc, 'html.parser')
        if not soup.body:
            return resp
        for el in soup.body.find_all():
            if el.name == 'input' or el.name == 'textarea':
                el.attrs['dir'] = "auto"
                continue
            for ch in el.text:
                if unicodedata.bidirectional(ch) in ('R', 'AL'):
                    el.attrs['dir'] = "auto"
                    break
        resp.data = soup.encode('utf8')
    return resp

class ModifiedUnicodeConverter(UnicodeConverter):
    """Added ~ and ^ to safe URL characters, otherwise no changes."""
    def to_url(self, value):
@@ -414,7 +396,7 @@ class SrhtFlask(Flask):
                route=request.endpoint,
                status=resp.status_code,
            ).observe(max(default_timer() - request._srht_start_time, 0))
            return inject_rtl_direction(resp)
            return resp

    def make_response(self, rv):
        # Converts responses from dicts to JSON response objects
-- 
2.42.0
core.sr.ht/patches: FAILED in 4m26s

[Do not inject dir=auto using BeautifulSoup][0] from [Ersei Saggi][1]

[0]: https://lists.sr.ht/~sircmpwn/sr.ht-dev/patches/47341
[1]: mailto:vcs@ersei.net

✓ #1107223 SUCCESS core.sr.ht/patches/alpine.yml    https://builds.sr.ht/~sircmpwn/job/1107223
✗ #1107224 FAILED  core.sr.ht/patches/archlinux.yml https://builds.sr.ht/~sircmpwn/job/1107224
✓ #1107225 SUCCESS core.sr.ht/patches/debian.yml    https://builds.sr.ht/~sircmpwn/job/1107225
Thanks for the patch, and sorry for delay, it took me a while to 
investigate this properly. Unfortunately, there is an issue with this, 
at least as far as I can tell.

I do think the approach of adding dir=auto to templates would be 
preferrable, but it is unfortunately not fully addressed in your patchset.

If you look closely at the injection, you will see that it not only sets 
dir=auto for inputs, but, crucially, for _every element which has text 
that contains a unicode character with inherent directionality_ (as 
determined by unicodeclass.bidirectional[1]).

You can easily test this: if you put some RTL content in your bio, it 
will correctly render RTL, because even though the bio is just a <p>, it 
will get a dir=auto slapped onto it. This, AFAICT, is not addressed in 
your patches.

What this means is that we'd have to put dir=auto into _every_ element 
that displays user generated content, including, I guess, files. I am 
not even sure how feasible that is?

[1] 
https://docs.python.org/3/library/unicodedata.html#unicodedata.bidirectional

Cheers,
Conrad