~technomancy/fennel

12 4

Improve pattern matching syntax

Rudolf Adamkovič <salutis@me.com>
Details
Message ID
<m2edkgthd4.fsf@me.com>
DKIM signature
missing
Download raw message
Hello friends!

I would like to suggest a pattern matching
syntax that

- looks like the rest of Fennel;
- does not overload 'or' to confuse the reader;
- does not overload 'or' to confuse 'grep';
- does not add unnecessary complexity [*]; and
- is backward compatible.

[*] For example, a simple disjunction is
worse in Fennel than in modern non-Lisp
languages like Swift:

Fennel:

  case x
    (where (or 1 2)) ...
  
Swift:

  switch x
    case 1, 2: ...

### SYNTAX ###

In this section,

  - P stands for a pattern,
  - B stands for a boolean guard, and
  - + stands for more than one.

The OLD syntax looks as follows:

  1. P
  2. (where P B+)
  3. (where (or P+))
  4. (where (or P+) B)
  5. (where (or P+) B+)

The NEW syntax looks as follows:

  1. P
  2. P &where B
  3. &in [P+]
  4. &in [P+] &where B

The 5th form for is not needed, as the user
can use standard operators, like 'and', to
combine multiple boolean expressions.

### EXAMPLES ###

- OLD: (where (or 1 2 3)) ; UGH!
- NEW: &in [1 2 3]

- OLD: (where 3 (> a b))
- NEW: 3 &where (> a b)

- OLD: (where x (> x y))
- NEW: x &where (> x y)

- OLD: (where (or 1 2 3) (> a b))
- NEW: &in [1 2 3] &where (> a b)

### REAL-WORLD EXAMPLES ###

(Taken from Egghead Games' code base.)

OLD and NEW:

  (case difficulty
    (where (or 1 2)) color.green
    (where (or 3 4)) color.orange
    5 color.red)
  
  (case difficulty
    &in [1 2] color.green
    &in [3 4] color.orange
    5 color.red)

OLD and NEW:

  (case [titles subtitles]
    [:3] "Casual"
    (where (or [:4 :4] [:4 :5])) "Challenging"
    (where (or [:4 :6] [:4 :7])) "Extreme"
    _ (error))
  
  (case [titles subtitles]
    [:3] "Casual"
    &in [[:4 :4] [:4 :5]] "Challenging"
    &in [[:4 :6] [:4 :7]] "Extreme"
    _ (error))

### ISSUES ###

The Fennel compiler cannot assert on an EVEN
NUMBER of pattern-body pairs.  If that is an
issue, then consider:

Possible solution 1: Underscores

  _ &in [1 2 3]
  x &in [1 2 3] &where (> x 1)

Possible solution 2: Brackets

  [&in [1 2 3]]
  [&in [1 2 3] &where (= a b)]

### NOTES ###

Instead of '&where', we could also use '&if'
or '&when'.  For example:

  &in [1 2 3] &if editing ; NICE!
  &in [1 2 3] &if (= a b) ; NICE!
  
  &in [1 2 3] &when editing
  &in [1 2 3] &when (= a b)

Note that '&if' and '&when' always read well,
unlike '&where'.  For example, consider

  &where editing ; MEH.

### QUESTIONS ###

What do you think?

Rudy
-- 
"Great minds discuss ideas; average minds discuss events; small minds discuss
people."
--- Anna Eleanor Roosevelt (1884-1962)

Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia
Details
Message ID
<87zg342fq6.fsf@gmail.com>
In-Reply-To
<m2edkgthd4.fsf@me.com> (view parent)
DKIM signature
missing
Download raw message
> What do you think?

Hi!

This looks a lot like Common Lisp's LOOP macro which has its own
non-lispy language inside of it. Which IMO, in case of Common Lisp is a
bad design.

I'm not sure on this new syntax, but I'm all for better
alternatives. After some time I wish I made it so the `where` is only
needed if we're supplying tests, and or could be used outside of where.
Simply because it is too many typing. I.e.:

(case x
  (or :a :b) ...
  ;; instead of
  (where (or :a :b)) ...)

Note, that this isn't the first iteration of the pattern-matching
syntax, previously match had the infix ? operator, which is still
supported, but it doesn't support altering patterns:

(case (foo)
  ;; old syntax
  (x ? (> x 10)) ...
  ;; new syntax
  (where x (> x 10)) ...)

Speaking of your variant I fear it will be quite complex to parse all of
this, and later it will be hard to make out what is happening in the
macro source code once we want to extend it with new features again.  Or
it will be simply too hard to add anything because of permutations of
positional arguments this style creates.

--
Andrey Listopadov
Details
Message ID
<878rao114q.fsf@hagelb.org>
In-Reply-To
<m2edkgthd4.fsf@me.com> (view parent)
DKIM signature
missing
Download raw message
Rudolf Adamkovič <salutis@me.com> writes:

> I would like to suggest a pattern matching
> syntax that
>
> - looks like the rest of Fennel;
> - does not overload 'or' to confuse the reader;
> - does not overload 'or' to confuse 'grep';

I have to express some skepticism here at this third point--I am a
frequent user of grep to find where some of my functions are being
called, but the idea of using it to find all the calls to a special like
`or` seems pretty far-fetched! I can't think of a question for which
such a query would be useful.

But of course; the first bullet point is a good enough reason on its own
to discuss alternatives here; it would be better if there were no such
inconsistency at all.

> For example, a simple disjunction is worse in Fennel than in modern
> non-Lisp languages like Swift:
>
> Fennel:
>
>   case x
>     (where (or 1 2)) ...
>
> Swift:
>
>   switch x
>     case 1, 2: ...

There is no technical reason that `or` could not be called from outside
a `where` clause, but the `where` clause in this case serves only to
delimit how far the overloaded meaning of `or` can be used. Given that
`or` is only one token away from `where`, the opportunities for
confusion are limited. But if `or` were replaced altogether with another
non-overloaded form, we could discuss dropping the requirement for it to
be in a `where` clause, and then the Fennel style and the Swift style
would be roughly equivalent.

> The NEW syntax looks as follows:
>
>   1. P
>   2. P &where B
>   3. &in [P+]
>   4. &in [P+] &where B

The biggest problem here is that you can no longer look at a piece of
code and say "what form is the clause here?" You would have to deal with
there being an undetermined number of forms making up a clause, which
means a clause is no longer a form, it's a list of forms. This means
that the implementation must represent it as something different from
the way the user sees it, similarly to how multivalues work in Lua.
It also overloads the square brackets to mean something that's not a
sequential table in a position where square brackets already mean
something important, which in my opinion is more confusing than
overloading `or`.

Even the fact that clauses and their accompanying body are split up
(unlike classic lisps like Scheme) causes some headache in contexts like
fnlfmt. This proposal exacerbates that significantly, because A) there
is no such regularity (at least with clause/body it's always 1:1) and B)
with clause/body splitting it mirrors the way that `let` binding clauses
work, but there is no such justification for splitting apart clauses
themselves.

But taking a step back, this doesn't really have anything to do with the
original problem it was introduced to address. The proposal solves the
overloading of `or` by replacing it with `&in` but then goes on to split
up the clause form. But once you've replaced `or` with `&in`, you've
already solved the problem; why keep going?

-Phil
Rudolf Adamkovič <salutis@me.com>
Details
Message ID
<m27cq7u3pr.fsf@me.com>
In-Reply-To
<m2edkgthd4.fsf@me.com> (view parent)
DKIM signature
missing
Download raw message
Thank you for the initial replies!

To scale the discussion a bit, I basically
see two problems with the current syntax:

P1. Simple disjunctions, which are common,
are not simple:

    (where (or ...))

P2. The 'or' and 'where' forms look like
ordinary function or macro calls, yet they
are something else entirely:

    (or ...)
    (where ...)

[Yes, this is like the 'loop' DSL in CL, but
also unlike everything else in Fennel.]

Thus, I have two questions:

Q1. Could we solve P1 by simply not requiring
'where' when there are no guards?

Q2. Could we solve P2 by either prefixing the
'or' and 'where' with '&' or by doing
something else?

Rudy
-- 
"The introduction of suitable abstractions is our only mental aid to
organize and master complexity."
-- Edsger Wybe Dijkstra, 1930-2002

Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia
Rudolf Adamkovič <salutis@me.com>
Details
Message ID
<m2350vu2hd.fsf@me.com>
In-Reply-To
<878rao114q.fsf@hagelb.org> (view parent)
DKIM signature
missing
Download raw message
Phil Hagelberg <phil@hagelb.org> writes:

> [...]

Gotcha!

> The proposal solves the overloading of `or` by replacing it with `&in`
> but then goes on to split up the clause form. But once you've replaced
> `or` with `&in`, you've already solved the problem; why keep going?

So, that would give us, if I understand correctly:

(&in PATTERN+)
(&where (&in PATTERN+) GUARD+)

Then, yet another option would be to NOT use
[] for &in (as you suggest) but wrap the
clause in [] as:

[&in PATTERN+]
[&in PATTERN+ &where GUARD]

That would be a tad simpler, more regular,
and also "more like Fennel" such as:

[... &until BREAK &into TABLE]

Rudy
-- 
"'Obvious' is all too often a synonym for 'wrong'."
-- Jeff Erickson, Algorithms, 2019

Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia
Details
Message ID
<176C0990-61AE-4E42-9422-DB90620D4FD8@gmail.com>
In-Reply-To
<m27cq7u3pr.fsf@me.com> (view parent)
DKIM signature
missing
Download raw message
>Q1. Could we solve P1 by simply not requiring
>'where' when there are no guards?

I think yes. Curious what others think.

>Q2. Could we solve P2 by either prefixing the
>'or' and 'where' with '&' or by doing
>something else?

I don't know but for some reason I view & as 
positional thing, e.g. you can't use & in arbitrary
places of the syntax. It is something /extra/ in the 
already established syntax, like iterator binding 
tables [k v iterator &until extra-stuff], or destructuring
(let [[a b &as extra-stuff] x] ...).

So imo, & is out of need here - we can replace or by in 
and allow it appear outside where, and it would be 
better than what we have now already.


-- 
Andrey Listopadov
Details
Message ID
<871qgb1qv6.fsf@hagelb.org>
In-Reply-To
<176C0990-61AE-4E42-9422-DB90620D4FD8@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Andrey <andreyorst@gmail.com> writes:

>>Q1. Could we solve P1 by simply not requiring
>>'where' when there are no guards?
>
> I think yes. Curious what others think.

On its own, no, because the `or` patterns look too much like calls to
the `or` special form. But if we switch to an unambiguous symbol for
it instead, then it becomes viable.

>>Q2. Could we solve P2 by either prefixing the
>>'or' and 'where' with '&' or by doing
>>something else?
>
> I don't know but for some reason I view & as 
> positional thing, e.g. you can't use & in arbitrary
> places of the syntax. It is something /extra/ in the 
> already established syntax, like iterator binding 
> tables [k v iterator &until extra-stuff], or destructuring
> (let [[a b &as extra-stuff] x] ...).

Well, that's the place it's currently used, but I don't know if we
should limit it to that in the long run. The real reason that we use it
there is that & is a valid character in a symbol but not a valid
character in an identifier; essentially it "reserves" it as a piece of
syntax that *cannot* already be bound to a value in that scope.

It's more or less a coincidence that it's only been used so far in
binding vectors.

> So imo, & is out of need here - we can replace or by in 
> and allow it appear outside where, and it would be 
> better than what we have now already.

If we use a bare `in` symbol then there is a potential for conflict with
existing identifiers. For instance:

    (let [in 9]
      (match (my-function)
        (in 33 92) (print :something)
        _ (error "bad values")))

This code would work fine in 1.3.1 but break in 1.4.0. That's why it's
so important that the & character is reserved. Using a symbol without &
here is roughly equivalent to adding a new special form; we could do it,
but only if we use a symbol like `faccumulate` which is extremely
unlikely to conflict with existing code. If we were introducing `where`
today we would have the same problem; the only reason we could do it is
that it was added before 1.0.0.

One way to think about it is that currently `where` is not a special
form of Fennel, but it is a special form of `case` and `match`. That's
part of why it feels so weird; it looks like a normal special form, but
you can't call it on its own independently.

I believe that introducing macro-specific special forms based on symbols
like `where` is considered by Schemers to be a macro hygiene issue for
more or less this reason. It's not as bad as the accidental symbol
capture hygiene issues that we solve with auto-gensym, but it's still
considered anaphoric and banned in Scheme since it introduces an
identifier "out of thin air". Using a symbol which is impossible to use
as an identifier makes it evident that this is not a function call or a
macro call; it's a "directive".

-Phil
Rudolf Adamkovič <salutis@me.com>
Details
Message ID
<m2a5uylahq.fsf@me.com>
In-Reply-To
<871qgb1qv6.fsf@hagelb.org> (view parent)
DKIM signature
missing
Download raw message
Phil Hagelberg <phil@hagelb.org> writes:

> If we use a bare `in` symbol then there is a potential for conflict with
> existing identifiers. [...] a "directive".

That makes perfect sense!

So, &-prefixing would instantly solve 3
problems:

  1. reader confusion,
  2. macro hygiene, and
  3. basic disjunctions.

The pinning construct '(= ...)'  should
become &-prefixed as well, if only for
syntactic consistency.

I think we all agree up to this point! <MARK>

Next up for consideration:

We could further improve the readability of

  (&where (&in ...) ...)

by flattening the syntax to

  (&in ...)
  (... &where ...)
  (&in ... &where ...)

or

  [&in ...]
  [... &where ...]
  [&in ... &where ...]

In Fennel, () and [] match multiple values
and sequential tables, respectively, so
either way, we overload something.  The []
form looks like the rest of Fennel, which is
good.

WDYT?

Rudy
-- 
"Logic is a science of the necessary laws of thought, without which no
employment of the understanding and the reason takes place."
-- Immanuel Kant, 1785

Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia
Details
Message ID
<87r0o9450p.fsf@gmail.com>
In-Reply-To
<m2a5uylahq.fsf@me.com> (view parent)
DKIM signature
missing
Download raw message
> by flattening the syntax to
>
>   (&in ...)
>   (... &where ...)
>   (&in ... &where ...)

Agree. Although I have some thoughts below.

>   [&in ...]
>   [... &where ...]
>   [&in ... &where ...]
>
> In Fennel, () and [] match multiple values
> and sequential tables, respectively, so
> either way, we overload something.  The []
> form looks like the rest of Fennel, which is
> good.

I don't think this makes sense tbh. We're matching on a data structure
here, so using &in in the data structure is weird, because it implies it
matches this or that IN the data structure.  In the previous example
parens are kinda like a calling convention, not a multiple value
convention.

E.g. what this should match to [&in [a 1] [2 b] c]? I would guess it
would mean that either [[_ 1]] or [[2 _]] or [_] would match, but it is
way too much overloading IMO.  It also implies that &in can be used at
any level in the data structure, which it is not, and which will make
pattern matching way too complex both in implementtion and in usage.

I think It's better written as (&in [[a 1]] [[2 b]] [c])

So a complete syntax would be:

(case x
  (&in pat1 pat2) :a
  (pat3 &where test) :b
  (&in pat4 pat5 &where test1 test2) :c
  _ :d)

Though I don't really like that &where is now a prefix operator
again. Originally I was motivated to introduce where to avoid using
prefix ? in pattern matching which felt unnatural.  So maybe the syntax
should be:

(case x
  (&in pat1 pat2) :a
  (&where pat3 test) :b
  (&where (&in pat4 pat5) test1 test2) :c
  _ :d)

It's closer to what we already have, and basically just allows using &in
outside of &where and renaming where to &where.

--
Andrey Listopadov
Rudolf Adamkovič <salutis@me.com>
Details
Message ID
<m2o7j85dl8.fsf@me.com>
In-Reply-To
<87r0o9450p.fsf@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Andrey Listopadov <andreyorst@gmail.com> writes:

> We're matching on a data structure here, so
> using &in in the data structure is weird,
> because it implies it matches this or that
> IN the data structure.  In the previous
> example parens are kinda like a calling
> convention, not a multiple value
> convention.

Sorry for not being clearer.  I see both
multiple values and tables as "data
structures".  Consider

(case (values 1 2)
  (1 2) :hit
  _ :miss)         ;; => hit

(case [1 2]
  [1 2] :hit
  _ :miss)         ;; => hit

The proposed (&or ...) would overload the
multiple values "data structure" syntax the
same way [&or ...] would overload the table
"data structure" syntax.

That said, I get that () is also used for
calling functions and macros, and I address
that aspect in the paragraphs below.

> It also implies that &in can be used at any
> level in the data structure, which it is
> not, and which will make pattern matching
> way too complex both in implementtion and
> in usage.

[&in ... &where ...] would be usable only in
the outermost balanced pair.  Is that not the
case with 'where' and 'or' already?  Neither
[&in ...] nor (&in ...) would be composable.
They are both magical in that they change the
meaning of the outermost balanced pair, be it
() or [], as shown above.

But then, why do we pretend composability and
"calling things"?  In other words, why do
'case' and 'match' need, unlike the rest of
Fennel, this unique "2-command DSL" that
makes code harder to read, compared to a
simple [&in ... &where ...] list?

Rudy
-- 
"It is far better to have a question that
can't be answered than an answer that can't
be questioned."  --- Carl Sagan

Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia
Details
Message ID
<878racnl91.fsf@gmail.com>
In-Reply-To
<m2o7j85dl8.fsf@me.com> (view parent)
DKIM signature
missing
Download raw message
> Sorry for not being clearer.  I see both
> multiple values and tables as "data
> structures".  Consider
>
> (case (values 1 2)
>   (1 2) :hit
>   _ :miss)         ;; => hit
>
> (case [1 2]
>   [1 2] :hit
>   _ :miss)         ;; => hit

Multiple values aren't a data structure though.  They act like one in
case of pattern matching, yes, but it's not a proper way to think about
them, as it gives false assumptions. I get what you mean here though, it
doesn't apply to the use of &where and &or.

> The proposed (&or ...) would overload the
> multiple values "data structure" syntax the
> same way [&or ...] would overload the table
> "data structure" syntax.

Yes, but then you will still need to use the multiple-value binding
syntax inside the clause: (&or (1 2) (3 4)) for case to match on either
1 and 2 or 3 and 4 values, so what's the point of using []?  Is there
any semantic difference between (&or ...) and [&or ...]?

I would agree, clauses in case are acting like a binder tables, much
like in let, and if that's the only the reason driving the use of []
then I don't think it is really appropriate, unless we introduce
explicit scoping, like:

(case (foo)
  ([&where [a b] (> a 10)]
   body
   goes
   here)
  ([&or (x y) (a b)]
   body
   goes
   here))

But it's too much nesting, and we can already use (do) in cases where we
need multple expressions in the bodyform of a specific branch.

---

In pattern matching we act on patterns that depict data structures, but
we're acting on them with special operators introduced by the case
macro, and using a calling-like syntax here is more appropriate in my
opinion.

But I don't want to gate-keep here, it's just what I think of as
appropriate syntax, but if other's decide that the suggested new syntax
is clear and concise, I will have no objections.  For now, I will stay
out of this discussion as much as possible, as I feel that I've said
everything I had to say.

--
Andrey Listopadov
Rudolf Adamkovič <salutis@me.com>
Details
Message ID
<m2jztw5aub.fsf@me.com>
In-Reply-To
<878racnl91.fsf@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Andrey Listopadov <andreyorst@gmail.com> writes:

> Yes, but then you will still need to use
> the multiple-value binding syntax inside
> the clause: (&or (1 2) (3 4)) for case to
> match on either 1 and 2 or 3 and 4 values,
> [...]

Correct.

> [...] so what's the point of using []?  Is
> there any semantic difference between (&or
> ...) and [&or ...]?

The reason for [] is exactly the point you
raised previously: "Though I don't really
like that &where is now a prefix operator
again."  Neither me.

> I would agree, clauses in case are acting
> like a binder tables, much like in let, and
> if that's the only the reason driving the
> use of [] then I don't think it is really
> appropriate, unless we introduce explicit
> scoping, like:

I meant the syntax to mimic [... &as ...],
[... &until ...], and the like.  Basically
all of &-prefixed Fennel.

Are those called binder tables?  If so, why
do you think it is not appropriate to use
such tables without the additional ()
scoping?

> For now, I will stay out of this discussion
> as much as possible, as I feel that I've
> said everything I had to say.

Please note that I remain thankful for the
time you have invested into this discussion,
and I appreciate all your insights!

Rudy
-- 
"All you have to do is write one true
sentence.  Write the truest sentence that you
know."  --- Ernest Miller Hemingway
(1899-1961)

Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia
Details
Message ID
<87fs4fzy6m.fsf@hagelb.org>
In-Reply-To
<m2jztw5aub.fsf@me.com> (view parent)
DKIM signature
missing
Download raw message
Andrey Listopadov <andreyorst@gmail.com> writes:

>> Sorry for not being clearer.  I see both
>> multiple values and tables as "data structures".
>
> Multiple values aren't a data structure though

You're both right, and that's why multiple values are the worst feature
of Lua! =) If I had to describe it, I would probably use the term
"second-class data structure" but in any case it's confusing!

Rudolf Adamkovič <salutis@me.com> writes:

> In other words, why do 'case' and 'match' need, unlike the rest of
> Fennel, this unique "2-command DSL" that makes code harder to read,
> compared to a simple [&in ... &where ...] list?

Let's try to avoid value judgments like this and stick to facts; the
point of the whole conversation is to try to determine *which* style is
easier to read, so an assertion that one is "simple" while the other is
"hard" to read is begging the question and as such is counterproductive.
Both alternatives are unique DSLs that don't have existing precedent;
the question at hand is which is less disruptive and more consistent.

The current design is definitely implying you should read `where` and
`or` as special forms. They are like little anaphoric macros introduced
by `case` which expand to pattern clauses that work differently from
normal pattern clauses. We're all in agreement that this is not good
design, but I think we're not necessarily in agreement as to why. Is it
bad because it's anaphoric, or is it bad because it's a special form?
Switching to `&or` helps with the anaphoric problem, but it's still a
special form.

> Are those called binder tables?  If so, why
> do you think it is not appropriate to use
> such tables without the additional () scoping?

One important fact about [] which I don't think has come up yet is that
the way that [] is used in `let`, `fn`, `each` etc is fundamentally
about *grouping*. Without brackets, we wouldn't know where the
name/value binding pairs ended and the body of the `let` begins. But
even if you have a function with only one argument, you still use the
brackets to declare the argument list.

There really isn't anywhere we currently use grouping brackets where you
can just leave the brackets out when you only have a single form. We
can add & clauses into the existing  grouping brackets to make it behave
differently, but in all the existing forms, the & clauses can be removed
without completely changing the meaning of a form; for instance:

    (each [k v (pairs x) &into x2] ...)
    (each [k v (pairs x)] ...)

These are roughly the same, but this doesn't work at all:

    (each k v (pairs x) ...)

So the idea that you could take an existing pattern clause and wrap it
with a new layer of brackets without changing the nesting doesn't really fit.

Introducing special forms that only work within a macro is certainly
unique to `case` so far, but it's not really a huge stretch; special
forms exist, it's easy to imagine calling macro which could expand into
a pattern clause. The only thing that's unique about it is that it's
scoped to `case`/`match`, which doesn't seem like as big of an
inconsistency.

-Phil
Reply to thread Export thread (mbox)