This is a followup to d3e4af1f0a33849b36196b035fddf64037e0c282 [1] and an old
issue of mine [2].
The patch restores the original behavior before 711447a [3], but is unicode
aware at least on my system using GNU utils.
Since apparently the existing awk(1) implementations more commonly support
unicode, use awk(1) to split the word completions instead of tr(1) and our
handpicked delimiter characters.
I have tested this with my initial example and the polish text from [2].
def foo(bar):
ba<Ctrl-n>
Jest dostępnych wiele różnych wersji Lorem Ipsum, ale większość
zmieniła się pod wpływem dodanego humoru czy przypadkowych słów,
które nawet w najmniejszym stopniu nie przypominają istniejących.
dost<Ctrl-n>
[1]: https://git.sr.ht/~martanne/vis/commit/d3e4af1f0a33849b36196b035fddf64037e0c282
[2]: https://github.com/martanne/vis/issues/1132
[3]: https://git.sr.ht/~martanne/vis/commit/711447afd4a7857d1ecb173893009924b07d4993
[PATCH vis] vis-complete: split words using awk(1) instead of tr(1)
Florian Fischer <florian.fischer@muhq.space> wrote:
> Similar to d3e4af1f0a33849b36196b035fddf64037e0c282 the> lack of unicode support in GNU's tr(1) implementation prevents> word completions for word containing unicode glyphs.> > Apparently, most awk implementations support unicode.
Thanks for the patch!
I'm a bit confused because your patch description is mentioning "awk"
but your patch is using "sed" ...
I have tried both a version with and without your patch and in both
cases I could autocomplete the Polish text in your coverletter.
The behavior with your patch is slightly different for the following
Japanese text, however.
そうじゃないですか?そうじゃないですか?
With your patch the completion of
そ|
(| is the cursor)
results in
そうじゃないですか
while without your patch on my machine I get
そうじゃないですか?そうじゃないですか?
which indicates that the full-width question mark is not recognised as
a word delimiter.
I think the completion for the Japanese text after applying your patch
is preferrable so I think we should apply it!
We should clear up the awk/sed confusion (on my side?) first though :)
Cheers,
Silvan
> > Signed-off-by: Florian Fischer <florian.fischer@muhq.space>> ---> vis-complete | 2 +-> 1 file changed, 1 insertion(+), 1 deletion(-)> > diff --git a/vis-complete b/vis-complete> index db925f69..a6829edc 100755> --- a/vis-complete> +++ b/vis-complete> @@ -48,7 +48,7 @@ fi> PATTERN="$1"> > if [ $COMPLETE_WORD = 1 ]; then> - tr -s '\t {}()[],<>%^&.\\' '\n' |> + sed -E 's/([^[:alnum:]_])/\n/g' |> grep "^$(basic_regex_quote "$PATTERN")." |> sort -u> else
Re: [PATCH vis] vis-complete: split words using awk(1) instead of tr(1)
On Thu Feb 15, 2024 at 8:29 AM CET, Florian Fischer wrote:
> Apparently, most awk implementations support unicode.> + sed -E 's/([^[:alnum:]_])/\n/g' |
Aside from the discrepancy between the commit message and the reality, we had
(https://lists.sr.ht/~martanne/devel/%3C20240205215719.17993-1-mcepl%40cepl.eu%3E
and see previous versions of this patch) long discussion, where
we came to the conclusion, that GNU sed is perfectly Unicode
aware, but other ones are not (e.g., BSD ones), so that it
is better to use awk, because it should be Unicode friendly
everywhere.
Best,
Matěj
--
http://matej.ceplovi.cz/blog/, @mcepl@floss.social
GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8
A woman without a man is like a fish without a bicycle.
Therefore, a man without a woman is like a bicycle without
a fish.