Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) by mail.sr.ht (Postfix) with ESMTPS id CDCB740199 for <~emersion/mrsh-dev@lists.sr.ht>; Fri, 25 Jan 2019 09:01:20 +0000 (UTC) Authentication-Results: mail.sr.ht; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Sokgw2C0 Received: by mail-wr1-f44.google.com with SMTP id r10so9382061wrs.10 for <~emersion/mrsh-dev@lists.sr.ht>; Fri, 25 Jan 2019 01:01:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=KdmZw23kRAhnFvCMVhMcHFQwy3pmK4SC7FJyaKcObk4=; b=Sokgw2C0XQ/poDL7pFkVDBKEXzZlvx+89jKZ4J9d/a9f5XmA5i1Y+vj0Obf9fh35cE WvT8HTQlqdHGs/KtJ9dztRuzfe9SPBjWXI7V/oITJYQyFE+2tIgEVJh+EqYl+vx0qbbk ydMFO1CPrTwal8fTldZkkmfjPInkLnpz6QJBN9kLcHl2htIeZeprPUHa3CeSoxov7LHs Hj6YQvcomaUk+RIlvDo+TjBxXdj5U6VQfDroxmQVy+mn1SQ1TjX5C5x8BqJlf8RQG7ca kpzT84R45AwGKhxPbx1kwxwzbu/BWk8/QP0CjrUmMx5Ia9mkQSU7hPO+9rLBfPoXP7/8 303Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=KdmZw23kRAhnFvCMVhMcHFQwy3pmK4SC7FJyaKcObk4=; b=ReifKXFGdvLCGOuqVdC+V5zLKecvwn4U9SWssn/oThRS2fuj9UH+++t72ogd+HHZ19 JTSuhnPKz1jFHQK6AftSvcpSinI/gsZd7RJdzg3SgIJ8zSMNa1NGPBQ83JOrbRy4O80D h+emDaO0QkMjasQBzMoqBo3Jx0s9NFhdtxz03+MxBa1ofaMCJedrKE8fNbP2j7ZwVJfk IwQMffeSz8+bhwtaCTx0FIbdc6rMQ1o00b7QEqfpdbori2vTe8FpGOURLbgQ6kRU4BV5 8jLqTcwLhMGPh3mgFPtd8qyqizvSK49F0ErEwH218ZOIvpwsFzG+dyZXNaDIU5TL6NC6 hTrQ== X-Gm-Message-State: AJcUukdgDafN2vwGLy8JaBBblnuXDDD6M3t4eTap2E6BSM9pcXM0Vno/ Br/hd8uczLwhV6gBtsQ1SneTzfX7 X-Google-Smtp-Source: ALg8bN4uaW/axI3NWS2BCBVUJ7qL0K25+sMgoEvfm2+I11A43iBGeySA5zu+niyDSzHk7my2AkOoDQ== X-Received: by 2002:adf:f649:: with SMTP id x9mr10316412wrp.247.1548406862189; Fri, 25 Jan 2019 01:01:02 -0800 (PST) Received: from localhost.localdomain (220.red-83-50-227.dynamicip.rima-tde.net. [83.50.227.220]) by smtp.gmail.com with ESMTPSA id t66sm29638145wme.15.2019.01.25.01.01.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 25 Jan 2019 01:01:01 -0800 (PST) From: =?UTF-8?q?Cristian=20Adri=C3=A1n=20Ontivero?= To: ~emersion/mrsh-dev@lists.sr.ht Cc: =?UTF-8?q?Cristian=20Adri=C3=A1n=20Ontivero?= Subject: [PATCH v2] Parse arithmetic expressions with shifts Date: Fri, 25 Jan 2019 09:57:55 +0100 Message-Id: <20190125085755.5290-1-cristianontivero@gmail.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable We introduce the function arithmetic_word() to parse arithmetic expressions instead of reusing the general word(), and generalize word_list() to receive a pointer to function, so that word_list() may be used to parse a list of whatever type of word we need. This fixes #51, and enables properly parsing parenthesized expressions inside arithmetic expressions, e.g. $(((2+1)-1)). --- This is mostly a working proof of concept. There is (as might be expected= ) quite a bit of similarity between word() and arithmetic_word(), but I think tha= t it would be better in the long term to remove the "end" parameter, and have = a couple of different *_word() functions that are called whenever appropria= te. As discussed, the alternative would be adding a mrsh_word_type parameter = to word() and word_list(), but I think that this would eventually lead to a = word() function with a lot of if-statements for each distinct context, and might= lead to a more complex word() function (although preventing the repetition int= rinsic to the *_word() functions alternative). What do you think? include/parser.h | 3 ++ parser/word.c | 120 +++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 118 insertions(+), 5 deletions(-) diff --git a/include/parser.h b/include/parser.h index 56e549f..d4b32b9 100644 --- a/include/parser.h +++ b/include/parser.h @@ -63,6 +63,8 @@ struct mrsh_parser { void *alias_user_data; }; =20 +typedef struct mrsh_word * (*word_fn)(struct mrsh_parser *, char end); + size_t parser_peek(struct mrsh_parser *state, char *buf, size_t size); char parser_peek_char(struct mrsh_parser *state); size_t parser_read(struct mrsh_parser *state, char *buf, size_t size); @@ -90,5 +92,6 @@ size_t peek_word(struct mrsh_parser *state, char end); struct mrsh_word *expect_dollar(struct mrsh_parser *state); struct mrsh_word *back_quotes(struct mrsh_parser *state); struct mrsh_word *word(struct mrsh_parser *state, char end); +struct mrsh_word *arithmetic_word(struct mrsh_parser *state, char end); =20 #endif diff --git a/parser/word.c b/parser/word.c index 3abede3..17b327a 100644 --- a/parser/word.c +++ b/parser/word.c @@ -173,7 +173,7 @@ char *read_token(struct mrsh_parser *state, size_t le= n, return tok; } =20 -static struct mrsh_word *word_list(struct mrsh_parser *state, char end) = { +static struct mrsh_word *word_list(struct mrsh_parser *state, char end, = word_fn fn) { struct mrsh_array children =3D {0}; =20 while (true) { @@ -181,7 +181,7 @@ static struct mrsh_word *word_list(struct mrsh_parser= *state, char end) { break; } =20 - struct mrsh_word *child =3D word(state, end); + struct mrsh_word *child =3D fn(state, end); if (child =3D=3D NULL) { break; } @@ -309,7 +309,7 @@ static struct mrsh_word_parameter *expect_parameter_e= xpression( return NULL; } op_range.end =3D state->pos; - arg =3D word_list(state, '}'); + arg =3D word_list(state, '}', word); } =20 struct mrsh_position rbrace_pos =3D state->pos; @@ -355,7 +355,7 @@ static struct mrsh_word_arithmetic *expect_word_arith= metic( c =3D parser_read_char(state); assert(c =3D=3D '('); =20 - struct mrsh_word *body =3D word_list(state, ')'); + struct mrsh_word *body =3D word_list(state, ')', arithmetic_word); if (body =3D=3D NULL) { if (!mrsh_parser_error(state, NULL)) { parser_set_error(state, "expected an arithmetic expression"); @@ -695,7 +695,117 @@ struct mrsh_word *word(struct mrsh_parser *state, c= har end) { } } =20 +/* TODO remove end parameter when no *_word function takes it */ +struct mrsh_word *arithmetic_word(struct mrsh_parser *state, char end) { + if (!symbol(state, TOKEN)) { + return NULL; + } + + char c =3D parser_peek_char(state); + if (is_operator_start(c)) { + return NULL; + } + + char next[3] =3D {0}; + if (c =3D=3D ')') { + parser_peek(state, next, sizeof(*next) * 2); + if (!strcmp(next, "))")) { + return NULL; + } + } + + struct mrsh_array children =3D {0}; + struct mrsh_buffer buf =3D {0}; + struct mrsh_position child_begin =3D {0}; + + while (true) { + if (!mrsh_position_valid(&child_begin)) { + child_begin =3D state->pos; + } + + parser_peek(state, next, sizeof(*next) * 2); + c =3D next[0]; + if (c =3D=3D '\0' || c =3D=3D '\n' || !strcmp(next, "))")) { + break; + } + + if (c =3D=3D '$') { + push_buffer_word_string(state, &children, &buf, &child_begin); + struct mrsh_word *t =3D expect_dollar(state); + if (t =3D=3D NULL) { + return NULL; + } + mrsh_array_add(&children, t); + continue; + } + + if (c =3D=3D '`') { + push_buffer_word_string(state, &children, &buf, &child_begin); + struct mrsh_word *t =3D back_quotes(state); + if (t =3D=3D NULL) { + return NULL; + } + mrsh_array_add(&children, t); + continue; + } + + // Quoting + if (c =3D=3D '\'') { + push_buffer_word_string(state, &children, &buf, &child_begin); + struct mrsh_word *t =3D single_quotes(state); + if (t =3D=3D NULL) { + return NULL; + } + mrsh_array_add(&children, t); + continue; + } + if (c =3D=3D '"') { + push_buffer_word_string(state, &children, &buf, &child_begin); + struct mrsh_word *t =3D double_quotes(state); + if (t =3D=3D NULL) { + return NULL; + } + mrsh_array_add(&children, t); + continue; + } + + if (c =3D=3D '\\') { + // Unquoted backslash + parser_read_char(state); + c =3D parser_peek_char(state); + if (c =3D=3D '\n') { + // Continuation line + read_continuation_line(state); + continue; + } + } else if (is_operator_start(c) || isblank(c)) { + if (strcmp(next, "<<") && strcmp(next, ">>")) { + break; + } + parser_read_char(state); + mrsh_buffer_append_char(&buf, c); + } + + parser_read_char(state); + mrsh_buffer_append_char(&buf, c); + } + + push_buffer_word_string(state, &children, &buf, &child_begin); + mrsh_buffer_finish(&buf); + + consume_symbol(state); + + if (children.len =3D=3D 1) { + struct mrsh_word *word =3D children.data[0]; + mrsh_array_finish(&children); // TODO: don't allocate this array + return word; + } else { + struct mrsh_word_list *wl =3D mrsh_word_list_create(&children, false); + return &wl->word; + } +} + struct mrsh_word *mrsh_parse_word(struct mrsh_parser *state) { parser_begin(state); - return word_list(state, 0); + return word_list(state, 0, word); } --=20 2.20.1