2 2

[PATCH] Parse arithmetic expressions with shifts

Cristian Adrián Ontivero
Details
Message ID
<20190120134953.20052-1-cristianontivero@gmail.com>
Download raw message
Patch +7 -1
Fixes #51
---
 parser/word.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/parser/word.c b/parser/word.c
index 3abede3..7cc23ba 100644
--- a/parser/word.c
+++ b/parser/word.c
@@ -673,7 +673,13 @@ struct mrsh_word *word(struct mrsh_parser *state, char end) {
 				continue;
 			}
 		} else if (is_operator_start(c) || isblank(c)) {
-			break;
+			char next[3] = {0};
+			parser_peek(state, next, sizeof(*next) * 2);
+			if (strcmp(next, "<<") && strcmp(next, ">>")) {
+				break;
+			}
+			parser_read_char(state);
+			mrsh_buffer_append_char(&buf, c);
 		}
 
 		parser_read_char(state);
-- 
2.20.1
Details
Message ID
<XsGlZl4ySoDGWIBYJcNrkJI0Oz8XkA0aj1evjgfp51FEAkCk9teINOaVCDmI0NjtY4ZHULIRFD-t3IQj9prJEEyNON4mPGqTkwGLwAsD0T4=@emersion.fr>
In-Reply-To
<20190120134953.20052-1-cristianontivero@gmail.com> (view parent)
Download raw message
Thanks for your patch! However I found an issue with it: << is also
used as a redirection operator. This causes issues with commands like
this:

  cat a<<EOF

In dash, this requests for an here-document and then runs `cat a`. In
mrsh, this runs `cat 'a<<EOF'` (<< is part of the word and is not
parsed as an operator).

The problem is that word() is used in a variety of contexts (a
consequence of this is the `end` parameter). word() should probably be
more aware of where it's called. For instance if called in an
arithmetic expression it probably makes sense to only accept ")" if
it's immediately followed by another ")". Also this currently fails to
parse:

  $((1+(1+1)+2))

Because the first ")" is interpreted as end-of-arithmetic-expression.

Maybe it's worth it to duplicate word() for arithmetic words. Or maybe
it could be done with a "word type" parameter for word(). What do you
think?
Cristian Ontivero
Details
Message ID
<CALvFPyusvj9kGWY53Orc54-hokqNxPnP3RUKziFYOs8ryfmbMQ@mail.gmail.com>
In-Reply-To
<XsGlZl4ySoDGWIBYJcNrkJI0Oz8XkA0aj1evjgfp51FEAkCk9teINOaVCDmI0NjtY4ZHULIRFD-t3IQj9prJEEyNON4mPGqTkwGLwAsD0T4=@emersion.fr> (view parent)
Download raw message
> The problem is that word() is used in a variety of contexts (a
> consequence of this is the `end` parameter). word() should probably be
> more aware of where it's called. For instance if called in an
> arithmetic expression it probably makes sense to only accept ")" if
> it's immediately followed by another ")".

It might make sense to change the `end` parameter from `char` to `char *`, in
that way specifying `))` as the end of word would be trivial. What do you think?

> Maybe it's worth it to duplicate word() for arithmetic words. Or maybe
> it could be done with a "word type" parameter for word(). What do you
> think?

I'd go for having a specific arithm_word() function for words inside
$(( )) unless the majority of the code is shared, in which case a new
`mrsh_word_type` parameter might make more sense. That would imply adding the
parameter in every point the function is currently called though. I guess I need
to play a bit more with the code and dig deeper to decide for one or the other.
I'll see if I can come up with a new patch that handles these issues. Thanks!