~technomancy/fennel

Require autogensym for identifiers inside backticked macros v1 PROPOSED

Interested in folks' thoughts on this.

It's basically borrowing the Clojure solution to macro hygiene, which is
not nearly as strict as Scheme's solution, but I believe is much easier
to understand. If folks feel strongly about exploring a stricter-hygiene
approach we can talk about that, but I was surprised with how low-effort
this particular approach was while still avoiding 99% of unintentional
symbol captures.

Basically now all local names that are inside a backtick must be
gensym'd now in order to avoid symbol capture, but you can do this
automatically by appending # to the symbol:

    `(let [x# 1] ,body)

This way locals introduced by macros have no chance of colliding with
locals in the body code, which is a common problem in defmacro-style systems.

It is unfortunately a breaking change, but the macro system has been
marked as "preliminary and subject to change over time", and the next
release has some other breaking things in it.

-Phil

Phil Hagelberg <phil@hagelb.org> writes:
I think its a good idea, and probably not too much wrong in the way of
backwards incompatibility. It's definitely a useful feature for macros
and I can't really think of any big problems with it. The patch looks
good!

As for stricter hygiene, I don't think that its worth it. This is
partially personal taste, but Scheme's hygienic macros are under-powered
and over-complicated. This is apparent in the multiple macro systems
created to replace syntax-rules. I also don't think that a more complicated
system meshes with Fennel ideologically. 

- Calvin
Export patchset (mbox)
How do I use this?

Copy & paste the following snippet into your terminal to import this patchset into git:

curl -s https://lists.sr.ht/~technomancy/fennel/%3C20190712200152.28699-1-phil%40hagelb.org%3E/mbox | git am -3
Learn more about email & git

[PATCH] Require autogensym for identifiers inside backticked macros Export this patch

When using backtick, in order to avoid symbol capture, bare symbols
are not allowed to be used as identifiers. Instead, gensyms must be
used. In order to make this more palatable, we support "auto-gensym"
by appending # to the end of your locals; within a single scope this
will create a gensym beginning with your local name.

This is similar to the method Clojure uses to prevent accidental
symbol capture; however Clojure does this by auto-qualifying all
symbols inside a backtick with the current namespace. Since we don't
have namespaces, we simply set a "quoted" flag on the symbol.

This also reverts the change which prevented # from being a valid
symbol constituent character.
---
 fennel.lua | 27 +++++++++++++++++++++++----
 test.lua   | 15 ++++++++++++++-
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/fennel.lua b/fennel.lua
index 0c31627..33926d5 100644
--- a/fennel.lua
+++ b/fennel.lua
@@ -185,7 +185,6 @@ local function issymbolchar(b)
         b ~= 126 and -- "~"
         b ~= 59 and -- ";"
         b ~= 44 and -- ","
-        b ~= 35 and -- "#"
         b ~= 64 and -- "@"
         b ~= 96 -- "`"
 end
@@ -430,6 +429,7 @@ local function makeScope(parent)
         includes = setmetatable({}, {
             __index = parent and parent.includes
         }),
+        autogensyms = {},
         parent = parent,
         vararg = parent and parent.vararg,
         depth = parent and ((parent.depth or 0) + 1) or 0,
@@ -505,6 +505,8 @@ local function isMultiSym(str)
     parts
 end
 
+local function isQuoted(symbol) return symbol.quoted end
+
 -- Mangler for global symbols. Does not protect against collisions,
 -- but makes them unlikely. This is the mangling that is exposed to
 -- to the world.
@@ -575,23 +577,36 @@ local function combineParts(parts, scope)
 end
 
 -- Generates a unique symbol in the scope.
-local function gensym(scope)
+local function gensym(scope, base)
     local mangling
     local append = 0
     repeat
-        mangling = '_' .. append .. '_'
+        mangling = (base or '') .. '_' .. append .. '_'
         append = append + 1
     until not scope.unmanglings[mangling]
     scope.unmanglings[mangling] = true
     return mangling
 end
 
+-- Generates a unique symbol in the scope based on the base name. Calling
+-- repeatedly with the same base and same scope will return existing symbol
+-- rather than generating new one.
+local function autogensym(base, scope)
+    if scope.autogensyms[base] then return scope.autogensyms[base] end
+    local mangling = gensym(scope, base)
+    scope.autogensyms[base] = mangling
+    return mangling
+end
+
 -- Check if a binding is valid
 local function checkBindingValid(symbol, scope, ast)
     -- Check if symbol will be over shadowed by special
     local name = symbol[1]
     assertCompile(not scope.specials[name],
     ("symbol %s may be overshadowed by a special form or macro"):format(name), ast)
+    assertCompile(not isQuoted(symbol), 'macro tried to bind ' .. name ..
+                      ' without gensym; try ' .. name .. '# instead', ast)
+
 end
 
 -- Declare a local symbol
@@ -1795,7 +1810,11 @@ local function doQuote (form, scope, parent, runtime)
     -- symbol
     elseif isSym(form) then
         assertCompile(not runtime, "symbols may only be used at compile time", form)
-        return ("sym('%s')"):format(deref(form))
+        if deref(form):find("#$") then -- autogensym
+            return ("sym('%s')"):format(autogensym(deref(form), scope))
+        else -- prevent non-gensymmed symbols from being bound as an identifier
+            return ("sym('%s', nil, {quoted=true})"):format(deref(form))
+        end
     -- unquote
     elseif isList(form) and isSym(form[1]) and (deref(form[1]) == 'unquote') then
         local payload = form[2]
diff --git a/test.lua b/test.lua
index 4cce51e..7e7316f 100644
--- a/test.lua
+++ b/test.lua
@@ -159,6 +159,8 @@ local cases = {
         ["(do (var a nil) (var b nil) (local ret (fn [] a)) (set (a b) (values 4 5)) (ret))"]=4,
         -- Tset doesn't screw up with table literal
         ["(do (tset {} :a 1) 1)"]=1,
+        -- # is valid symbol constituent character
+        ["(local x#x# 90) x#x#"]=90,
     },
 
     ifforms = {
@@ -282,7 +284,9 @@ local cases = {
         ["(macros {:x (fn [] `(fn [...] (+ 1 1)))}) ((x))"]=2,
         -- Threading macro with single function, with and without parens
         ["(-> 1234 (string.reverse) (string.upper))"]="4321",
-        ["(-> 1234 string.reverse string.upper)"]="4321"
+        ["(-> 1234 string.reverse string.upper)"]="4321",
+        -- Auto-gensym
+        ["(macros {:m (fn [y] `(let [xa# 1] (+ xa# ,y)))}) (m 4)"]=5,
     },
     hashfn = {
         -- Basic hashfn
@@ -479,12 +483,21 @@ local compile_failures = {
     ["(let [t {:a 1}] (+ t.a BAD))"]="BAD",
     ["(each [k v (pairs {})] (BAD k v))"]="BAD",
     ["(global good (fn [] nil)) (good) (BAD)"]="BAD",
+    -- shadowing built-ins
     ["(global + 1)"]="overshadowed",
     ["(global // 1)"]="overshadowed",
     ["(global let 1)"]="overshadowed",
     ["(global - 1)"]="overshadowed",
     ["(let [global 1] 1)"]="overshadowed",
     ["(fn global [] 1)"]="overshadowed",
+    -- symbol capture detection
+    ["(macros {:m (fn [y] `(let [x 1] (+ x ,y)))}) (m 4)"]=
+        "tried to bind x without gensym",
+    ["(macros {:m (fn [t] `(fn [xabc] (+ xabc 9)))}) ((m 4))"]=
+        "tried to bind xabc without gensym",
+    ["(macros {:m (fn [t] `(each [mykey (pairs ,t)] (print mykey)))}) (m [])"]=
+        "tried to bind mykey without gensym",
+    -- other
     ["(match [1 2 3] [a & b c] nil)"]="rest argument in final position",
     ["(x(y))"]="expected whitespace before opening delimiter %(",
     ["(x[1 2])"]="expected whitespace before opening delimiter %[",
-- 
2.11.0
Interested in folks' thoughts on this.

It's basically borrowing the Clojure solution to macro hygiene, which is
not nearly as strict as Scheme's solution, but I believe is much easier
to understand. If folks feel strongly about exploring a stricter-hygiene
approach we can talk about that, but I was surprised with how low-effort
this particular approach was while still avoiding 99% of unintentional
symbol captures.

Basically now all local names that are inside a backtick must be
gensym'd now in order to avoid symbol capture, but you can do this
automatically by appending # to the symbol:

    `(let [x# 1] ,body)

This way locals introduced by macros have no chance of colliding with
locals in the body code, which is a common problem in defmacro-style systems.

It is unfortunately a breaking change, but the macro system has been
marked as "preliminary and subject to change over time", and the next
release has some other breaking things in it.

-Phil

Phil Hagelberg <phil@hagelb.org> writes:
View this thread in the archives