~sircmpwn/sr.ht-dev

This thread contains a patchset. You're looking at the original emails, but you may wish to use the patch review UI. Review patch
10 3

[PATCH core.sr.ht v4] Replace misaka (hoedown) with mistletoe

Details
Message ID
<20200208205612.22406-1-araspik@protonmail.com>
DKIM signature
missing
Download raw message
Patch: +46 -46
It looked like using hoedown (through misaka), which is very
unmaintained, was leading to a lot of issues (see [0]). This replaces
misaka by mistletoe [1], without losing any functionality (I hope).
Others have tested the patch successfully, so it should work.

[0]: https://todo.sr.ht/~sircmpwn/sr.ht/20
[1]: https://github.com/miyuchina/mistletoe
---
Added `import html` as per Benjamin Lowry. Thanks!

 setup.py         |  2 +-
 srht/markdown.py | 90 ++++++++++++++++++++++++------------------------
 2 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/setup.py b/setup.py
index 4834607..58ddebf 100755
--- a/setup.py
+++ b/setup.py
@@ -32,11 +32,11 @@ setup(
      'sqlalchemy-utils',
      'psycopg2',
      'markdown',
      'mistletoe',
      'bleach',
      'requests',
      'BeautifulSoup4',
      'pgpy',
      'misaka',
      'pygments',
      'cryptography',
  ],
diff --git a/srht/markdown.py b/srht/markdown.py
index cf9677c..e1f5175 100644
--- a/srht/markdown.py
+++ b/srht/markdown.py
@@ -7,52 +7,56 @@ from pygments.formatters import HtmlFormatter, ClassNotFound
from pygments.lexers import get_lexer_by_name
from urllib.parse import urlparse, urlunparse
import bleach
import misaka as m
import html
import mistletoe as m
import re

class RelativeLinkPrefixRenderer(m.HtmlRenderer):
    def __init__(self, *args, link_prefix=None, **kwargs):
        super().__init__(args, **kwargs)
class SrhtRenderer(m.HTMLRenderer):
    def __init__(self, link_prefix=None, baselevel=1):
        super().__init__()
        self.link_prefix = link_prefix
        self.baselevel = baselevel
    
    def render_link(self, token):
        template = '<a href="{target}"{title}>{inner}</a>'
        url = token.target
        if not url.startswith("#"):
            p = urlparse(url)
            if not p.netloc and not p.scheme and self.link_prefix:
                path = join(self.link_prefix, p.path)
                url = urlunparse(('', '', path, p.params, p.query, p.fragment))
        target = self.escape_url(url)
        if token.title:
            title = ' title="{}"'.format(self.escape_html(token.title))
        else:
            title = ''
        inner = self.render_inner(token)
        return template.format(target=target, title=title, inner=inner)

    def link(self, content, url, title=''):
        maybe_title = f' title="{m.escape_html(title)}"' if title else ''
        if url.startswith("#"):
            return f'<a href="{url}"{maybe_title}>{content}</a>'
        p = urlparse(url)
        if not p.netloc and not p.scheme and self.link_prefix:
            path = join(self.link_prefix, p.path)
            url = urlunparse(('', '', path, p.params, p.query, p.fragment))
        return f'<a href="{url}"{maybe_title}>{content}</a>'
    def render_block_code(self, token):
        template = '<pre><code{attr}>{inner}</code></pre>'
        if token.language:
            try:
                lexer = get_lexer_by_name(lang, stripall=True)
            except ClassNotFound:
                lexer = None
            if lexer:
                formatter = HtmlFormatter()
                return highlight(token.children[0].content, lexer, formatter)
            else:
                attr = ' class="{}"'.format('language-{}'.format(self.escape_html(token.language)))
        else:
            attr = ''
        inner = html.escape(token.children[0].content)
        return template.format(attr=attr, inner=inner)

class HighlighterRenderer(m.HtmlRenderer):
    def __init__(self, *args, baselevel=1, **kwargs):
        super().__init__(*args, **kwargs)
        self.baselevel = 1

    def blockcode(self, text, lang):
        try:
            lexer = get_lexer_by_name(lang, stripall=True)
        except ClassNotFound:
            lexer = None
        if lexer:
            formatter = HtmlFormatter()
            return highlight(text, lexer, formatter)
        # default
        return '\n<pre><code>{}</code></pre>\n'.format(
                escape(text.strip()))

    def header(self, content, level):
        level += self.baselevel
    def render_heading(self, token):
        template = '<h{level}>{inner}</h{level}>'
        level = token.level + self.baselevel
        if level > 6:
            level = 6
        _id = re.sub(r'[^a-z0-9-_]', '', content.lower().replace(" ", "-"))
        return f'''\n<h{str(level)} id="{_id}">
            {content}
        </h{str(level)}>\n'''

class CustomRenderer(RelativeLinkPrefixRenderer, HighlighterRenderer):
    pass
        inner = self.render_inner(token)
        return template.format(level=level, inner=inner)

urlregex = re.compile(r'(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?\xab\xbb\u201c\u201d\u2018\u2019]))')

@@ -107,12 +111,8 @@ def markdown(text, tags=[], baselevel=1, link_prefix=None):
        + ["padding-{}".format(p) for p in ["left", "right", "bottom", "top"]]
        + ["margin-{}".format(p) for p in ["left", "right", "bottom", "top"]],
        strip=True)
    renderer = md = m.Markdown(
        CustomRenderer(baselevel=baselevel, link_prefix=link_prefix),
        extensions=(
            'tables', 'fenced-code', 'footnotes', 'strikethrough', 'highlight',
            'quote', 'autolink'))
    html = renderer(text)
    with SrhtRenderer() as renderer:
        html = renderer.render(m.Document(text))
    html = cleaner.clean(html)
    formatter = HtmlFormatter()
    style = formatter.get_style_defs('.highlight') + " .highlight { background: inherit; }"
-- 
2.25.0
Details
Message ID
<C0K9BZPCFU0Q.2C16JOE3OGP1V@homura>
In-Reply-To
<20200208205612.22406-1-araspik@protonmail.com> (view parent)
DKIM signature
missing
Download raw message
This doesn't seem to work with code blocks, like so:

```c
int main(int argc, char *argv[]) {
	return 0;
}
```

It raises `NameError: name 'lang' is not defined` on line 40 of
render_block_code.

Unrelated: do you reckon it's feasible to implement GitHub-style
checkbox lists based on mistletoe? The main thing we need to do is find
the line/col or index of the [ ]/[x] so that we can emit a checkbox
which encodes the appropriate character in the markdown source to be
updated when the checkbox is ticked or unticked.
ARaspiK
Details
Message ID
<CbZqPscdSHUVlMuJQ_db2LdOYSob_-vOtzWU_RyyW91HMaB-qSiLL3IVba2GgGHfZpi_yiqTqBQL0-s3P8tt3XZjzd03ATEqJCbtAVyCu8A=@pm.me>
In-Reply-To
<C0K9BZPCFU0Q.2C16JOE3OGP1V@homura> (view parent)
DKIM signature
missing
Download raw message
> It raises `NameError: name 'lang' is not defined` on line 40 of
> render_block_code.

Oh my god, I am so stupid. It should be `token.language`, not `lang`.
Sorry about that. I have not had the time to setup sourcehut on my own
yet, I'll hopefully be able to do that soon.

> Unrelated: do you reckon it's feasible to implement GitHub-style
> checkbox lists based on mistletoe? The main thing we need to do is find
> the line/col or index of the [ ]/[x] so that we can emit a checkbox
> which encodes the appropriate character in the markdown source to be
> updated when the checkbox is ticked or unticked.

Hmm, shouldn't be too difficult. It is possible to override
`render_list` in a similar manner as `render_block_code` in order to do
whatever we want (see `mistletoe/html_renderer.py` within `mistletoe`).
Looks like a good output format would be thusly:

    <input type="checkbox" [checked] readonly> list text blah blah

Checking for this would simply entail checking each list item (which
seems to be accessible as `token.children` within `render_list`), which
looks very easy. I can submit a patch for that, if you would like.
However, you might need to style the resulting checkboxes, which seems
to not be straightforward. How you would do that is up to you, but it
will be trivial to use a different output format.


Glad to help!

P.S: Manually using `msmtp`, please inform if any issues with it.
Details
Message ID
<C0KGNC40UT57.1FXD5JRXCCWBM@homura>
In-Reply-To
<CbZqPscdSHUVlMuJQ_db2LdOYSob_-vOtzWU_RyyW91HMaB-qSiLL3IVba2GgGHfZpi_yiqTqBQL0-s3P8tt3XZjzd03ATEqJCbtAVyCu8A=@pm.me> (view parent)
DKIM signature
missing
Download raw message
On Wed Feb 12, 2020 at 8:03 PM, ARaspiK wrote:
> Hmm, shouldn't be too difficult. It is possible to override
> `render_list` in a similar manner as `render_block_code` in order to do
> whatever we want (see `mistletoe/html_renderer.py` within `mistletoe`).
> Looks like a good output format would be thusly:
>
> <input type="checkbox" [checked] readonly> list text blah blah

Note quite: the input name (<input ... name="...">) needs to encode the
index in the markdown source string which needs to be changed when the
checkbox is updated. This has been the hangup with other markdown
implementations.
ARaspiK
Details
Message ID
<81hnVpf3RFm5__KUzncFE_9pqoVmfNQNUAriO4lnHg5Z5pQl83kDe8tFDdHhA_E3YKfUkJ83RjN3blcWypdgQUSywpjCrBJY_lvuZjbbO78=@pm.me>
In-Reply-To
<C0KGNC40UT57.1FXD5JRXCCWBM@homura> (view parent)
DKIM signature
missing
Download raw message
> Not quite: the input name (<input ... name="...">) needs to encode the
> index in the markdown source string which needs to be changed when the
> checkbox is updated. This has been the hangup with other markdown
> implementations.

Oh! I did not realise you intended to have editing support for the
checkboxes. In that case, yes, that information will have to be encoded.
Unfortunately, however, mistletoe does not seem to provide that
information, as far as I could tell. In addition, the amount of extra
work that would have to go into this process (a separate form for every
checkbox, matched with a page that parses the input, verifies the user,
then finds and updates the checkbox in the markdown, along with
potential user confusion when checkboxes only work in this one context)
is, in my opinion, is too much to justify the feature.
Details
Message ID
<hAx36CC8zIL2xzekZ-1kcQ9OSselegR4E2SQDXJcRB1CqW5ofYxidbP6RCmY8t2scKs1rir3cVhg0Edqsp21FTrZ4wucDUXr9ug84ncpeko=@emersion.fr>
In-Reply-To
<81hnVpf3RFm5__KUzncFE_9pqoVmfNQNUAriO4lnHg5Z5pQl83kDe8tFDdHhA_E3YKfUkJ83RjN3blcWypdgQUSywpjCrBJY_lvuZjbbO78=@pm.me> (view parent)
DKIM signature
missing
Download raw message
Maybe having a simple counter (e.g. "this is the 7th checkbox") would
be enough?
ARaspiK
Details
Message ID
<_RzW7UDvhdrsoMcyR3zag10i5QK1dUu76qNiEEtObqvdkrgUYng_ITjq_L2DWYWkqudUTYAoFNZ382nDdMJ5Y5b0cOxc_egLgN-g6mzHcQw=@pm.me>
In-Reply-To
<hAx36CC8zIL2xzekZ-1kcQ9OSselegR4E2SQDXJcRB1CqW5ofYxidbP6RCmY8t2scKs1rir3cVhg0Edqsp21FTrZ4wucDUXr9ug84ncpeko=@emersion.fr> (view parent)
DKIM signature
missing
Download raw message
> Maybe having a simple counter (e.g. "this is the 7th checkbox") would
> be enough?

Crap, it really is that easy. In that case, it could certainly be
implemented; But the disadvantages I gave previously still stand, so
it's up to sircmpwn if he wants the feature.
Details
Message ID
<C0L4GTWJEDK5.236D6GZUY9GPU@homura>
In-Reply-To
<hAx36CC8zIL2xzekZ-1kcQ9OSselegR4E2SQDXJcRB1CqW5ofYxidbP6RCmY8t2scKs1rir3cVhg0Edqsp21FTrZ4wucDUXr9ug84ncpeko=@emersion.fr> (view parent)
DKIM signature
missing
Download raw message
On Wed Feb 12, 2020 at 8:49 PM, Simon Ser wrote:
> Maybe having a simple counter (e.g. "this is the 7th checkbox") would
> be enough?

Probably not, because you'd still have to parse the markdown to figure
out where the 7th checkbox is. For example:

- [ ] Hello

```
- [ ] World
```

- [ ] Foo bar baz
ARaspiK
Details
Message ID
<42QkQ3pG_nmK6ucpEBfur7m1806XYSHpCkGujiFWFGvOPNjTCGdEihuqtz4lDSbKV2CUkDoaF-OKbohhWvnpU-f8ZceCQMGFIfWbD3MUI2c=@pm.me>
In-Reply-To
<C0L4GTWJEDK5.236D6GZUY9GPU@homura> (view parent)
DKIM signature
missing
Download raw message
On Thu, 13 Feb 2020 09:56:48 -0500, "Drew DeVault" <sir@cmpwn.com> wrote:
> 
> Probably not, because you'd still have to parse the markdown to figure
> out where the 7th checkbox is. For example:
> ...

Oops. In that case, there probably isn't a straightforward way to
implement this. Even mistune [0] doesn't support this yet.

[0]: https://github.com/lepture/mistune/issues/218
Details
Message ID
<C0LYXKZ7ZCM3.1U33FQE8MGDLN@homura>
In-Reply-To
<42QkQ3pG_nmK6ucpEBfur7m1806XYSHpCkGujiFWFGvOPNjTCGdEihuqtz4lDSbKV2CUkDoaF-OKbohhWvnpU-f8ZceCQMGFIfWbD3MUI2c=@pm.me> (view parent)
DKIM signature
missing
Download raw message
Alright, no worries for now. Let's just ship v5 once the other issues
are addressed.
ARaspiK
Details
Message ID
<NXQdegS06BAXmZutdNEWGO20H1pvDfOwGX3IuJblr_N0mKkMoc2YxNH8dyGw5gHTS_5I-MFSvJIyr5JJV9O79nHWJcHdVu4Pmik6anX0las=@pm.me>
In-Reply-To
<C0LYXKZ7ZCM3.1U33FQE8MGDLN@homura> (view parent)
DKIM signature
missing
Download raw message
On Fri, 14 Feb 2020 09:49:14 -0500, "Drew DeVault" <sir@cmpwn.com> wrote:
> 
> Alright, no worries for now. Let's just ship v5 once the other issues
> are addressed.

Would it be possible for you to correct the `s/lang/token.language/`
issue within srht/markdown.py and check if anything else is broken?
Easier to get everything done in one go.

Thanks!
Review patch Export thread (mbox)