~emersion/soju-dev

This thread contains a patchset. You're looking at the original emails, but you may wish to use the patch review UI. Review patch
3 2

[PATCH] Make sure that WebSocket messages are valid UTF-8

Details
Message ID
<20200902145510.118734-1-hubert@hirtzfr.eu>
DKIM signature
pass
Download raw message
Patch: +2 -1
... by replacing invalid bytes with the REPLACEMENT CHARACTER U+FFFD

This is better than:
- discarding the whole message, since the user would not see it...
- removing invalid bytes, since the user would not see their presence,
- converting the encoding (this is actually not possible).

Contrary to its documentation, strings.ToValidUTF8 doesn't copy the
string if it's valid UTF-8:
<https://golang.org/src/strings/strings.go?s=15815:15861#L623>
---
 conn.go | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/conn.go b/conn.go
index 6914179..701e8f5 100644
--- a/conn.go
+++ b/conn.go
@@ -5,6 +5,7 @@ import (
	"fmt"
	"io"
	"net"
	"strings"
	"sync"
	"time"

@@ -62,7 +63,7 @@ func (wic websocketIRCConn) ReadMessage() (*irc.Message, error) {
}

func (wic websocketIRCConn) WriteMessage(msg *irc.Message) error {
	b := []byte(msg.String())
	b := []byte(strings.ToValidUTF8(msg.String(), "\ufffd"))
	ctx := context.Background()
	if !wic.writeDeadline.IsZero() {
		var cancel context.CancelFunc
-- 
2.28.0
Details
Message ID
<ug9ecp_yUvAh2R4mjN-bh65eCZuroADt1IiCSwRq64-9d0t8LDWby8yt4wxtnbRyGngi90O14I1tBczPAWdtMtbu9Jova1GZ4IMmOO4wn_E=@emersion.fr>
In-Reply-To
<20200902145510.118734-1-hubert@hirtzfr.eu> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
Nice!

Can we use the unicode.ReplacementChar [1] constant instead of
hardcoding the value?

There's also bytes.ToValidUTF8, but not sure it's better than
strings.ToValidUTF8.

[1]: https://golang.org/pkg/unicode/#pkg-constants
Details
Message ID
<4301c206-9acd-e79a-9327-38964dde982c@hirtzfr.eu>
In-Reply-To
<ug9ecp_yUvAh2R4mjN-bh65eCZuroADt1IiCSwRq64-9d0t8LDWby8yt4wxtnbRyGngi90O14I1tBczPAWdtMtbu9Jova1GZ4IMmOO4wn_E=@emersion.fr> (view parent)
DKIM signature
pass
Download raw message
Le 02/09/2020 à 16:57, Simon Ser a écrit :
> Nice!
> 
> Can we use the unicode.ReplacementChar [1] constant instead of
> hardcoding the value?

The second argument is a string though, not a rune.

> There's also bytes.ToValidUTF8, but not sure it's better than
> strings.ToValidUTF8.
> 
> [1]: https://golang.org/pkg/unicode/#pkg-constants

It seems the difference is that bytes.ToValidUTF8 always copy its input:
<https://golang.org/src/bytes/bytes.go?s=17630:17676#L670>
Details
Message ID
<30e3__OUumdXCN-i3_IFbifjNeWKOISFVqc6AgqhqrpYcP073IOjk6dg3yeW6AquvISqmTKnCn258-tOSN9Hs26QqMUGtqJwD33GTKf2gxE=@emersion.fr>
In-Reply-To
<4301c206-9acd-e79a-9327-38964dde982c@hirtzfr.eu> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
Hubert Hirtz <hubert@hirtzfr.eu> wrote:

> Le 02/09/2020 à 16:57, Simon Ser a écrit :
>
> > Nice!
> > Can we use the unicode.ReplacementChar 1 constant instead of
> > hardcoding the value?
>
> The second argument is a string though, not a rune.

string(unicode.ReplacementChar) should do it.

> > There's also bytes.ToValidUTF8, but not sure it's better than
> > strings.ToValidUTF8.
>
> It seems the difference is that bytes.ToValidUTF8 always copy its input:
> https://golang.org/src/bytes/bytes.go?s=17630:17676#L670

OK, that makes sense since bytes are mutable, don't want to mutate the
input and end up mutating the output too.
Review patch Export thread (mbox)