~theo/clws-devel

11 4

Re: Tree-sitter api

Details
Message ID
<CFBFE9C7-BA7B-483D-B5FC-3E1308C480E6@gmail.com>
DKIM signature
missing
Download raw message
It’s been a while and no one provided further comments on the indent and font-lock integration of tree-sitter, so I finished the manuals for indent and font-lock integration. They are under 24.6 Font Lock Mode and 24.7 Automatic Indentation of code. Once the author of tree-sitter allow tree-sitter to change malloc implementation at runtime, tree-sitter integration will be ready. (Though I suspect that won’t come soon. The author is still actively developing tree-sitter but he didn’t reply to my request.)

As before, the code is at https://github.com/casouri/emacs on ts branch.

Yuan

Re: Tree-sitter api

Eli Zaretskii <eliz@gnu.org>
Details
Message ID
<83k0g8mznc.fsf@gnu.org>
In-Reply-To
<CFBFE9C7-BA7B-483D-B5FC-3E1308C480E6@gmail.com> (view parent)
DKIM signature
missing
Download raw message
> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 12 Dec 2021 22:54:59 -0800
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>  ubolonton@gmail.com,
>  theo@thornhill.no,
>  cpitclaudel@gmail.com,
>  emacs-devel@gnu.org,
>  stephen_leake@stephe-leake.org,
>  john@yates-sheets.org
> 
> It’s been a while and no one provided further comments on the indent and font-lock integration of tree-sitter, so I finished the manuals for indent and font-lock integration. They are under 24.6 Font Lock Mode and 24.7 Automatic Indentation of code. Once the author of tree-sitter allow tree-sitter to change malloc implementation at runtime, tree-sitter integration will be ready. (Though I suspect that won’t come soon. The author is still actively developing tree-sitter but he didn’t reply to my request.)

Would you please ping the authors and tell them that this single issue
prevents us from integrating TS into Emacs?  Maybe that would change
their priorities.  I cannot imagine that the feature we are asking is
hard to implement.

> As before, the code is at https://github.com/casouri/emacs on ts branch.

Thanks.  Perhaps people could try testing the branch and providing
feedback?

Re: Tree-sitter api

Details
Message ID
<EA0EAAD4-0DFB-4B0B-A44F-E4B565389AA3@gmail.com>
In-Reply-To
<83k0g8mznc.fsf@gnu.org> (view parent)
DKIM signature
missing
Download raw message
>> 
>> It’s been a while and no one provided further comments on the indent and font-lock integration of tree-sitter, so I finished the manuals for indent and font-lock integration. They are under 24.6 Font Lock Mode and 24.7 Automatic Indentation of code. Once the author of tree-sitter allow tree-sitter to change malloc implementation at runtime, tree-sitter integration will be ready. (Though I suspect that won’t come soon. The author is still actively developing tree-sitter but he didn’t reply to my request.)
> 
> Would you please ping the authors and tell them that this single issue
> prevents us from integrating TS into Emacs?  Maybe that would change
> their priorities.  I cannot imagine that the feature we are asking is
> hard to implement.

Done.

>> As before, the code is at https://github.com/casouri/emacs on ts branch.
> 
> Thanks.  Perhaps people could try testing the branch and providing
> feedback?

Yes. Now that the manual is complete, people are welcome to try it out and see what they like and don’t like. It would be even better if someone wants to implement some major modes with the new tree-sitter features.

Yuan

Re: Tree-sitter api

Details
Message ID
<CD55A6DC-EA04-4C04-8CBF-677D0E4419CE@gmail.com>
In-Reply-To
<EA0EAAD4-0DFB-4B0B-A44F-E4B565389AA3@gmail.com> (view parent)
DKIM signature
missing
Download raw message

> On Dec 13, 2021, at 11:19 PM, Yuan Fu <casouri@gmail.com> wrote:
> 
>>> 
>>> It’s been a while and no one provided further comments on the indent and font-lock integration of tree-sitter, so I finished the manuals for indent and font-lock integration. They are under 24.6 Font Lock Mode and 24.7 Automatic Indentation of code. Once the author of tree-sitter allow tree-sitter to change malloc implementation at runtime, tree-sitter integration will be ready. (Though I suspect that won’t come soon. The author is still actively developing tree-sitter but he didn’t reply to my request.)
>> 
>> Would you please ping the authors and tell them that this single issue
>> prevents us from integrating TS into Emacs?  Maybe that would change
>> their priorities.  I cannot imagine that the feature we are asking is
>> hard to implement.
> 
> Done.

Someone commented on my request saying

> Had this issue as well, but thought was too niche to open an issue. The standard way to change the allocator at runtime is with the  LD_PRELOAD envvar (see mimalloc or any allocator doc).

IIUC it is more of a user-feature right? Like you will use LD_PRELOAD=xxx program but not change the environment programmatically in the program? Could Emacs do this should tree-sitter doesn’t want to change?

BTW the conversation is at https://github.com/tree-sitter/tree-sitter/issues/1535

The author suggested to implement runtime change of malloc on top of current macros, but I think he missed the point (we don’t want to maintain our own version of tree-sitter).

Yuan

Re: Tree-sitter api

Eli Zaretskii <eliz@gnu.org>
Details
Message ID
<831r2bg0s1.fsf@gnu.org>
In-Reply-To
<CD55A6DC-EA04-4C04-8CBF-677D0E4419CE@gmail.com> (view parent)
DKIM signature
missing
Download raw message
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 16 Dec 2021 16:14:52 -0800
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>  Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
>  Theodor Thornhill <theo@thornhill.no>,
>  Clément Pit-Claudel <cpitclaudel@gmail.com>,
>  Emacs developers <emacs-devel@gnu.org>,
>  Stephen Leake <stephen_leake@stephe-leake.org>,
>  john@yates-sheets.org
> 
> Someone commented on my request saying
> 
> > Had this issue as well, but thought was too niche to open an issue. The standard way to change the allocator at runtime is with the  LD_PRELOAD envvar (see mimalloc or any allocator doc).
> 
> IIUC it is more of a user-feature right? Like you will use LD_PRELOAD=xxx program but not change the environment programmatically in the program? Could Emacs do this should tree-sitter doesn’t want to change?

I don't think we want to use LD_PRELOAD for this, for several good
reasons.  It's non-portable, for starters.

> The author suggested to implement runtime change of malloc on top of current macros, but I think he missed the point (we don’t want to maintain our own version of tree-sitter).

Yes.

I hope we get a better response from the developers of Tree-sitter.

Re: Tree-sitter api

Daniel Martín <mardani29@yahoo.es>
Details
Message ID
<m1sfuq3ud1.fsf@yahoo.es>
In-Reply-To
<CFBFE9C7-BA7B-483D-B5FC-3E1308C480E6@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Yuan Fu <casouri@gmail.com> writes:

> It’s been a while and no one provided further comments on the indent
> and font-lock integration of tree-sitter, so I finished the manuals
> for indent and font-lock integration. They are under 24.6 Font Lock
> Mode and 24.7 Automatic Indentation of code. Once the author of
> tree-sitter allow tree-sitter to change malloc implementation at
> runtime, tree-sitter integration will be ready. (Though I suspect that
> won’t come soon. The author is still actively developing tree-sitter
> but he didn’t reply to my request.)
>
> As before, the code is at https://github.com/casouri/emacs on ts branch.
>
> Yuan

Thank you for your work.  I had some troubles getting the latest code to
compile, so I've sent you a pull request with potential fixes:
https://github.com/casouri/emacs/pull/4  I have signed the FSF papers.

I have a general question about the major modes.  I see there's a couple
of sample major modes, one for C and another for JSON, but they are in
tree-sitter.el.  How would those new major modes be included with Emacs?
Will there be a c-ts mode, separate from cc-mode, which will implement
font lock and indentation in terms of tree-sitter (when Emacs is
compiled with tree-sitter support)?  Or the plan is to extend the core
language modes to offer an option to support tree-sitter? (I'm not sure
how complicated and clean that would be.)

Thanks.

Re: Tree-sitter api

Details
Message ID
<D57A20AA-DB25-4F50-A3EE-5EA879E3537B@gmail.com>
In-Reply-To
<83k0g8mznc.fsf@gnu.org> (view parent)
DKIM signature
missing
Download raw message

> Am 13.12.2021 um 13:56 schrieb Eli Zaretskii <eliz@gnu.org>:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sun, 12 Dec 2021 22:54:59 -0800
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>> ubolonton@gmail.com,
>> theo@thornhill.no,
>> cpitclaudel@gmail.com,
>> emacs-devel@gnu.org,
>> stephen_leake@stephe-leake.org,
>> john@yates-sheets.org
>> 
>> It’s been a while and no one provided further comments on the indent and font-lock integration of tree-sitter, so I finished the manuals for indent and font-lock integration. They are under 24.6 Font Lock Mode and 24.7 Automatic Indentation of code. Once the author of tree-sitter allow tree-sitter to change malloc implementation at runtime, tree-sitter integration will be ready. (Though I suspect that won’t come soon. The author is still actively developing tree-sitter but he didn’t reply to my request.)
> 
> Would you please ping the authors and tell them that this single issue
> prevents us from integrating TS into Emacs?  Maybe that would change
> their priorities.  I cannot imagine that the feature we are asking is
> hard to implement.

That feature in itself won't be enough.  Even with it, TreeSitter will have the same problem as GMP: allocation isn't allowed to fail, and longjmp'ing out of it isn't allowed and generally causes undefined behavior.  What's needed is a rewrite of the TreeSitter code so that it handles allocation failure properly and gracefully by returning an error to the caller.

Re: Tree-sitter api

Eli Zaretskii <eliz@gnu.org>
Details
Message ID
<83tuf69d06.fsf@gnu.org>
In-Reply-To
<D57A20AA-DB25-4F50-A3EE-5EA879E3537B@gmail.com> (view parent)
DKIM signature
missing
Download raw message
> From: Philipp <p.stephani2@gmail.com>
> Date: Sat, 18 Dec 2021 15:45:18 +0100
> Cc: Yuan Fu <casouri@gmail.com>,
>  ubolonton@gmail.com,
>  theo@thornhill.no,
>  cpitclaudel@gmail.com,
>  emacs-devel@gnu.org,
>  monnier@iro.umontreal.ca,
>  stephen_leake@stephe-leake.org,
>  john@yates-sheets.org
> 
> > Would you please ping the authors and tell them that this single issue
> > prevents us from integrating TS into Emacs?  Maybe that would change
> > their priorities.  I cannot imagine that the feature we are asking is
> > hard to implement.
> 
> That feature in itself won't be enough.  Even with it, TreeSitter will have the same problem as GMP: allocation isn't allowed to fail, and longjmp'ing out of it isn't allowed and generally causes undefined behavior.

It may not be enough to satisfy purists, but it's enough to allow the
user to save the session and shut down Emacs in an orderly fashion,
instead of abruptly exiting and losing all the edits.

Re: Tree-sitter api

Details
Message ID
<B97FEF50-9FD7-4E61-928E-0C9041B05EC6@gmail.com>
In-Reply-To
<m1sfuq3ud1.fsf@yahoo.es> (view parent)
DKIM signature
missing
Download raw message

> On Dec 18, 2021, at 5:39 AM, Daniel Martín <mardani29@yahoo.es> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>> It’s been a while and no one provided further comments on the indent
>> and font-lock integration of tree-sitter, so I finished the manuals
>> for indent and font-lock integration. They are under 24.6 Font Lock
>> Mode and 24.7 Automatic Indentation of code. Once the author of
>> tree-sitter allow tree-sitter to change malloc implementation at
>> runtime, tree-sitter integration will be ready. (Though I suspect that
>> won’t come soon. The author is still actively developing tree-sitter
>> but he didn’t reply to my request.)
>> 
>> As before, the code is at https://github.com/casouri/emacs on ts branch.
>> 
>> Yuan
> 
> Thank you for your work.  I had some troubles getting the latest code to
> compile, so I've sent you a pull request with potential fixes:
> https://github.com/casouri/emacs/pull/4  I have signed the FSF papers.

Yes, sorry, I forgot to push fixes after merging. I included your fix for the switch case. Thanks.

> 
> I have a general question about the major modes.  I see there's a couple
> of sample major modes, one for C and another for JSON, but they are in
> tree-sitter.el.  How would those new major modes be included with Emacs?
> Will there be a c-ts mode, separate from cc-mode, which will implement
> font lock and indentation in terms of tree-sitter (when Emacs is
> compiled with tree-sitter support)?  Or the plan is to extend the core
> language modes to offer an option to support tree-sitter? (I'm not sure
> how complicated and clean that would be.)

They are just my experiments and I included them in tree-sitter.el as examples for anyone want to try out tree-sitter. They will be removed when tree-sitter integration merges into master. Each major mode should optionally take advantage of tree-sitter features according to tree-sitter-enable-p. (At least that’s my plan, no one has objected to this approach so far.)

Yuan

Re: Tree-sitter api

Details
Message ID
<F20937C8-3AA7-4C02-8789-690663DD2F79@gmail.com>
In-Reply-To
<83tuf69d06.fsf@gnu.org> (view parent)
DKIM signature
missing
Download raw message
>> 
>>> Would you please ping the authors and tell them that this single issue
>>> prevents us from integrating TS into Emacs?  Maybe that would change
>>> their priorities.  I cannot imagine that the feature we are asking is
>>> hard to implement.
>> 
>> That feature in itself won't be enough.  Even with it, TreeSitter will have the same problem as GMP: allocation isn't allowed to fail, and longjmp'ing out of it isn't allowed and generally causes undefined behavior.
> 
> It may not be enough to satisfy purists, but it's enough to allow the
> user to save the session and shut down Emacs in an orderly fashion,
> instead of abruptly exiting and losing all the edits.

Uses can set tree-sitter-maximum-size to limit memory usage of tree-sitter. Buffers with size larger than that cannot enable tree-sitter. That doesn’t solve the problem directly but should let users avoid allocation failing most of the time.

Yuan

Re: Tree-sitter api

Eli Zaretskii <eliz@gnu.org>
Details
Message ID
<83czlt9ii5.fsf@gnu.org>
In-Reply-To
<F20937C8-3AA7-4C02-8789-690663DD2F79@gmail.com> (view parent)
DKIM signature
missing
Download raw message
> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 18 Dec 2021 18:51:25 -0800
> Cc: Philipp <p.stephani2@gmail.com>,
>  ubolonton@gmail.com,
>  theo@thornhill.no,
>  cpitclaudel@gmail.com,
>  emacs-devel@gnu.org,
>  monnier@iro.umontreal.ca,
>  stephen_leake@stephe-leake.org,
>  john@yates-sheets.org
> 
> >> That feature in itself won't be enough.  Even with it, TreeSitter will have the same problem as GMP: allocation isn't allowed to fail, and longjmp'ing out of it isn't allowed and generally causes undefined behavior.
> > 
> > It may not be enough to satisfy purists, but it's enough to allow the
> > user to save the session and shut down Emacs in an orderly fashion,
> > instead of abruptly exiting and losing all the edits.
> 
> Uses can set tree-sitter-maximum-size to limit memory usage of tree-sitter. Buffers with size larger than that cannot enable tree-sitter. That doesn’t solve the problem directly but should let users avoid allocation failing most of the time.

Btw, we should have a good idea how frequent this out-of-memory
problem could be with tree-sitter.  Did someone try to scroll through
all of xdisp.c, using tree-sitter for C Mode fontifications, and
measured the memory footprint that produces?  If not, I think it would
be a good idea to try.

If the OOM problem happens frequently with large source files, it may
indeed be the case that we will need to disable tree-sitter up front
based on some size criteria.

Re: Tree-sitter api

Details
Message ID
<9C5A86D6-0E7D-4DDF-B211-278EF9AC7E01@gmail.com>
In-Reply-To
<83czlt9ii5.fsf@gnu.org> (view parent)
DKIM signature
missing
Download raw message

> On Dec 18, 2021, at 11:11 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sat, 18 Dec 2021 18:51:25 -0800
>> Cc: Philipp <p.stephani2@gmail.com>,
>> ubolonton@gmail.com,
>> theo@thornhill.no,
>> cpitclaudel@gmail.com,
>> emacs-devel@gnu.org,
>> monnier@iro.umontreal.ca,
>> stephen_leake@stephe-leake.org,
>> john@yates-sheets.org
>> 
>>>> That feature in itself won't be enough.  Even with it, TreeSitter will have the same problem as GMP: allocation isn't allowed to fail, and longjmp'ing out of it isn't allowed and generally causes undefined behavior.
>>> 
>>> It may not be enough to satisfy purists, but it's enough to allow the
>>> user to save the session and shut down Emacs in an orderly fashion,
>>> instead of abruptly exiting and losing all the edits.
>> 
>> Uses can set tree-sitter-maximum-size to limit memory usage of tree-sitter. Buffers with size larger than that cannot enable tree-sitter. That doesn’t solve the problem directly but should let users avoid allocation failing most of the time.
> 
> Btw, we should have a good idea how frequent this out-of-memory
> problem could be with tree-sitter.  Did someone try to scroll through
> all of xdisp.c, using tree-sitter for C Mode fontifications, and
> measured the memory footprint that produces?  If not, I think it would
> be a good idea to try.
> 
> If the OOM problem happens frequently with large source files, it may
> indeed be the case that we will need to disable tree-sitter up front
> based on some size criteria.

From the author’s quote and my experiments, tree-sitter uses about 10–20x memory of the buffer size. So xdisp.c is fine. Also you don’t need to scroll through the buffer, tree-sitter parses the whole buffer up-front.

Yuan
Reply to thread Export thread (mbox)