~martanne/devel

This thread contains a patchset. You're looking at the original emails, but you may wish to use the patch review UI. Review patch
4 4

[PATCH vis] vis-complete: complete words containing unicode

Details
Message ID
<20240215073929.182891-1-florian.fischer@muhq.space>
DKIM signature
permerror
Download raw message
This is a followup to d3e4af1f0a33849b36196b035fddf64037e0c282 [1] and an old
issue of mine [2].

The patch restores the original behavior before 711447a [3], but is unicode
aware at least on my system using GNU utils.

Since apparently the existing awk(1) implementations more commonly support
unicode, use awk(1) to split the word completions instead of tr(1) and our
handpicked delimiter characters.

I have tested this with my initial example and the polish text from [2].
	def foo(bar):
	  ba<Ctrl-n>

	Jest dostępnych wiele różnych wersji Lorem Ipsum, ale większość
	zmieniła się pod wpływem dodanego humoru czy przypadkowych słów,
	które nawet w najmniejszym stopniu nie przypominają istniejących.

	dost<Ctrl-n>

[1]: https://git.sr.ht/~martanne/vis/commit/d3e4af1f0a33849b36196b035fddf64037e0c282
[2]: https://github.com/martanne/vis/issues/1132
[3]: https://git.sr.ht/~martanne/vis/commit/711447afd4a7857d1ecb173893009924b07d4993

[PATCH vis] vis-complete: split words using awk(1) instead of tr(1)

Details
Message ID
<20240215073929.182891-2-florian.fischer@muhq.space>
In-Reply-To
<20240215073929.182891-1-florian.fischer@muhq.space> (view parent)
DKIM signature
permerror
Download raw message
Patch: +1 -1
Similar to d3e4af1f0a33849b36196b035fddf64037e0c282 the
lack of unicode support in GNU's tr(1) implementation prevents
word completions for word containing unicode glyphs.

Apparently, most awk implementations support unicode.

Signed-off-by: Florian Fischer <florian.fischer@muhq.space>
---
 vis-complete | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/vis-complete b/vis-complete
index db925f69..a6829edc 100755
--- a/vis-complete
+++ b/vis-complete
@@ -48,7 +48,7 @@ fi
PATTERN="$1"

if [ $COMPLETE_WORD = 1 ]; then
	tr -s '\t {}()[],<>%^&.\\' '\n' |
	sed -E 's/([^[:alnum:]_])/\n/g' |
		grep "^$(basic_regex_quote "$PATTERN")." |
		sort -u
else
-- 
2.43.2

[vis/patches] build success

builds.sr.ht <builds@sr.ht>
Details
Message ID
<CZ5HE5L9M4LI.1WLAHWH8JIEN3@fra02>
In-Reply-To
<20240215073929.182891-2-florian.fischer@muhq.space> (view parent)
DKIM signature
missing
Download raw message
vis/patches: SUCCESS in 1m35s

[vis-complete: complete words containing unicode][0] from [Florian Fischer][1]

[0]: https://lists.sr.ht/~martanne/devel/patches/49527
[1]: florian.fischer@muhq.space

✓ #1150478 SUCCESS vis/patches/debian.yml  https://builds.sr.ht/~martanne/job/1150478
✓ #1150477 SUCCESS vis/patches/alpine.yml  https://builds.sr.ht/~martanne/job/1150477
✓ #1150479 SUCCESS vis/patches/freebsd.yml https://builds.sr.ht/~martanne/job/1150479
✓ #1150480 SUCCESS vis/patches/openbsd.yml https://builds.sr.ht/~martanne/job/1150480

Re: [PATCH vis] vis-complete: split words using awk(1) instead of tr(1)

Details
Message ID
<3R5DQTEE36V0M.3QXGMCGVOHPY5@homearch.localdomain>
In-Reply-To
<20240215073929.182891-2-florian.fischer@muhq.space> (view parent)
DKIM signature
missing
Download raw message
Florian Fischer <florian.fischer@muhq.space> wrote:
> Similar to d3e4af1f0a33849b36196b035fddf64037e0c282 the
> lack of unicode support in GNU's tr(1) implementation prevents
> word completions for word containing unicode glyphs.
> 
> Apparently, most awk implementations support unicode.

Thanks for the patch!

I'm a bit confused because your patch description is mentioning "awk"
but your patch is using "sed" ...

I have tried both a version with and without your patch and in both
cases I could autocomplete the Polish text in your coverletter.

The behavior with your patch is slightly different for the following
Japanese text, however.

そうじゃないですか?そうじゃないですか?
 
With your patch the completion of

そ|

(| is the cursor)

results in

そうじゃないですか

while without your patch on my machine I get

そうじゃないですか?そうじゃないですか?

which indicates that the full-width question mark is not recognised as
a word delimiter.

I think the completion for the Japanese text after applying your patch
is preferrable so I think we should apply it!

We should clear up the awk/sed confusion (on my side?) first though :)

Cheers,
Silvan



> 
> Signed-off-by: Florian Fischer <florian.fischer@muhq.space>
> ---
>  vis-complete | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/vis-complete b/vis-complete
> index db925f69..a6829edc 100755
> --- a/vis-complete
> +++ b/vis-complete
> @@ -48,7 +48,7 @@ fi
>  PATTERN="$1"
>  
>  if [ $COMPLETE_WORD = 1 ]; then
> -	tr -s '\t {}()[],<>%^&.\\' '\n' |
> +	sed -E 's/([^[:alnum:]_])/\n/g' |
>  		grep "^$(basic_regex_quote "$PATTERN")." |
>  		sort -u
>  else

Re: [PATCH vis] vis-complete: split words using awk(1) instead of tr(1)

Details
Message ID
<CZ6FQ8OJW95U.1KPZW701XME11@cepl.eu>
In-Reply-To
<20240215073929.182891-2-florian.fischer@muhq.space> (view parent)
DKIM signature
missing
Download raw message
On Thu Feb 15, 2024 at 8:29 AM CET, Florian Fischer wrote:
> Apparently, most awk implementations support unicode.

> +	sed -E 's/([^[:alnum:]_])/\n/g' |

Aside from the discrepancy between the commit message and the reality, we had
(https://lists.sr.ht/~martanne/devel/%3C20240205215719.17993-1-mcepl%40cepl.eu%3E
and see previous versions of this patch) long discussion, where
we came to the conclusion, that GNU sed is perfectly Unicode
aware, but other ones are not (e.g., BSD ones), so that it
is better to use awk, because it should be Unicode friendly
everywhere.

Best,

Matěj

-- 
http://matej.ceplovi.cz/blog/, @mcepl@floss.social
GPG Finger: 3C76 A027 CA45 AD70 98B5  BC1D 7920 5802 880B C9D8
 
A woman without a man is like a fish without a bicycle.
Therefore, a man without a woman is like a bicycle without
a fish.
Reply to thread Export thread (mbox)