~protesilaos/denote

8 2

Change to how Org filetags are formatted

Details
Message ID
<871qu0jw5l.fsf@protesilaos.com>
DKIM signature
pass
Download raw message
Hello folks (cc Alan and Peter who mentioned this before),

Just to inform you that in the development version we now format the
"#+filetags" as the Org manual describes.

The relevant commit:

    commit 37dca916435849e5307025197eee21c39d5d821b
    Author: Protesilaos Stavrou <info@protesilaos.com>
    Date:   Mon Aug 1 09:37:44 2022 +0300

        CHANGE how Org filetags are formatted by default

        Before, we would separate them by two spaces:

            filetags:  tag1  tag2

        Now we do:

            filetags:  :tag1:tag2:

        Technically, our space-separated format was correct though it could
        confuse users, as the Org manual only mentions the colon-separated
        format.  To avoid potential conflicts, we now do what Org and its users
        expect.

        Thanks to Jean-Philippe Gagné Guay for commenting on this change:
        <https://github.com/protesilaos/denote/commit/ea7d079c2d2f1d1ab932c5b1147a8492a703ddc1>.

        Also thanks to Alan Schmitt and Peter Prevos for mentioning this topic
        on different mailing list threads:

        - <https://lists.sr.ht/~protesilaos/denote/%3C875yk1x8en.fsf@m4x.org%3E#%3C87mtd8m82n.fsf@m4x.org%3E>
        - <https://lists.sr.ht/~protesilaos/denote/%3C87k081l6vw.fsf@silverstone.mail-host-address-is-not-set%3E#%3Cecd9f72563bbe41db77291a050b3de38@prevos.net%3E>

     denote.el | 16 +++++++++++++---
     1 file changed, 13 insertions(+), 3 deletions(-)

Work will continue on the renaming commands and on how the front matter
is read/written.

All the best,
Prot

-- 
Protesilaos Stavrou
https://protesilaos.com
Details
Message ID
<871qu0xa7i.fsf@m4x.org>
In-Reply-To
<871qu0jw5l.fsf@protesilaos.com> (view parent)
DKIM signature
pass
Download raw message
Hello,

On 2022-08-01 09:49, Protesilaos Stavrou <info@protesilaos.com> writes:

> Hello folks (cc Alan and Peter who mentioned this before),
>
> Just to inform you that in the development version we now format the
> "#+filetags" as the Org manual describes.

Thank you for the heads up. Will there be a function to do the
conversion in the front matter? Also, do you plan to also enforce the
restrictions for the character sets of tags?

> Tags are normal words containing letters, numbers, ‘_’, and ‘@’.

I think the main issue is that '_' is problematic in tags, and '-' is
not allowed. Maybe a correct tag 'org_roam' could be represented as
'org-roam' in the file name?

Best,

Alan
Details
Message ID
<874jyv50n0.fsf@protesilaos.com>
In-Reply-To
<871qu0xa7i.fsf@m4x.org> (view parent)
DKIM signature
pass
Download raw message
> From: Alan Schmitt <alan.schmitt@polytechnique.org>
> Date: Mon, 01 Aug 2022 17:19:45 +0200
>
> Hello,

Hello Alan,

> On 2022-08-01 09:49, Protesilaos Stavrou <info@protesilaos.com> writes:
>
>> Hello folks (cc Alan and Peter who mentioned this before),
>>
>> Just to inform you that in the development version we now format the
>> "#+filetags" as the Org manual describes.
>
> Thank you for the heads up. Will there be a function to do the
> conversion in the front matter? Also, do you plan to also enforce the
> restrictions for the character sets of tags?

You are welcome!  I should have clarified that the old format will still
be read properly in the relevant Denote code, so there is no need to
update existing notes.

As for a way to perform the conversion, we are in the process of
refactoring the commands that rename files and rewrite/add the front
matter.  It still is a work-in-progress though.

Perhaps we could provide something like this (PROOF OF CONCEPT) that the
user could run on a bunch of buffers to auto-update the front matter.

    (defun not-complete-org-filetags-with-spaces-to-colons ()
      (format
       ":%s:"
       (replace-regexp-in-string
        "\s\s" ":"
        (save-excursion
          (save-restriction
            (widen)
            (goto-char (point-min))
            (when (re-search-forward denote--retrieve-keywords-front-matter-key-regexp nil t 1)
              (let ((trims "[ \t\n\r\"']+"))
                (string-trim
                 (buffer-substring-no-properties (point) (point-at-eol))
                 trims trims))))))))

It formats "tag1  tag2  tag3  tag4" as ":tag1:tag2:tag3:tag4:".  There
is a more efficient way to do it by reusing code from Denote, but you
get the idea.

>> Tags are normal words containing letters, numbers, ‘_’, and ‘@’.
>
> I think the main issue is that '_' is problematic in tags, and '-' is
> not allowed. Maybe a correct tag 'org_roam' could be represented as
> 'org-roam' in the file name?

Do hyphens cause unexpected results?  I understand they are not
explicitly mentioned in the manual, but maybe this is another case where
we have undocumented behaviour?

If so, then we do need to handle it internally.  We don't do it right
now.

All the best,
Prot

-- 
Protesilaos Stavrou
https://protesilaos.com
Details
Message ID
<87sfmfw0vu.fsf@m4x.org>
In-Reply-To
<874jyv50n0.fsf@protesilaos.com> (view parent)
DKIM signature
pass
Download raw message
Hello Prot,

On 2022-08-01 20:33, Protesilaos Stavrou <info@protesilaos.com> writes:

> Perhaps we could provide something like this (PROOF OF CONCEPT) that the
> user could run on a bunch of buffers to auto-update the front matter.

I think such a function would be very convenient.

>>> Tags are normal words containing letters, numbers, ‘_’, and ‘@’.
>>
>> I think the main issue is that '_' is problematic in tags, and '-' is
>> not allowed. Maybe a correct tag 'org_roam' could be represented as
>> 'org-roam' in the file name?
>
> Do hyphens cause unexpected results?  I understand they are not
> explicitly mentioned in the manual, but maybe this is another case where
> we have undocumented behaviour?

This is probably a place where the documentation is more strict than the
code.

I did a quick look at org’s source code, and there is one place where
this seems to be enforced. In org-element.el, the function
org-element-headline-parser uses this code to parse tags:

	   (tags (when (re-search-forward
			"[ \t]+\\(:[[:alnum:]_@#%:]+:\\)[ \t]*$"
			(line-end-position)
			'move)
		   (goto-char (match-beginning 0))
		   (org-split-string (match-string 1) ":")))

If I read the regexp correctly, a dash in a tag would make the
re-search-forward fail, hence tags would not be parsed. But once again,
it is for headlines, whereas we are using filetags…

Best,

Alan
Details
Message ID
<87pmhjm1fs.fsf@protesilaos.com>
In-Reply-To
<87sfmfw0vu.fsf@m4x.org> (view parent)
DKIM signature
pass
Download raw message
> From: Alan Schmitt <alan.schmitt@polytechnique.org>
> Date: Tue, 02 Aug 2022 09:38:45 +0200
>
> Hello Prot,

Hello Alan,

> On 2022-08-01 20:33, Protesilaos Stavrou <info@protesilaos.com> writes:
>
>> Perhaps we could provide something like this (PROOF OF CONCEPT) that the
>> user could run on a bunch of buffers to auto-update the front matter.
>
> I think such a function would be very convenient.

Okay.  We will have one ready by the time of the new release.

>>>> Tags are normal words containing letters, numbers, ‘_’, and ‘@’.
>>>
>>> I think the main issue is that '_' is problematic in tags, and '-' is
>>> not allowed. Maybe a correct tag 'org_roam' could be represented as
>>> 'org-roam' in the file name?
>>
>> Do hyphens cause unexpected results?  I understand they are not
>> explicitly mentioned in the manual, but maybe this is another case where
>> we have undocumented behaviour?
>
> This is probably a place where the documentation is more strict than the
> code.
>
> I did a quick look at org’s source code, and there is one place where
> this seems to be enforced. In org-element.el, the function
> org-element-headline-parser uses this code to parse tags:
>
> 	   (tags (when (re-search-forward
> 			"[ \t]+\\(:[[:alnum:]_@#%:]+:\\)[ \t]*$"
> 			(line-end-position)
> 			'move)
> 		   (goto-char (match-beginning 0))
> 		   (org-split-string (match-string 1) ":")))
>
> If I read the regexp correctly, a dash in a tag would make the
> re-search-forward fail, hence tags would not be parsed. But once again,
> it is for headlines, whereas we are using filetags…

I am reading it the same way you did.  This code would have also failed
with the space-separated tags.

I have thought again about what I wrote and I am thinking we should not
add any code to Denote to handle the enforcement of the correct
characters in Org's filetags.  Org should be enforcing it instead.  For
example, there could be a function that runs when the user saves the
file, which warns that the filetags have inappropriate characters.

By adding such code to Denote, we would be broadening its scope which
eventually adds to the maintenance burden.  Also, we would be doing a
disservice to other users who are writing their own filetags
incorrectly.  Whereas if Org was to handle this, every user would be
covered.  And if the Org maintainers do not think this is necessary, why
should we?

What do you think?

All the best,
Prot

-- 
Protesilaos Stavrou
https://protesilaos.com
Details
Message ID
<87bkt294ff.fsf@m4x.org>
In-Reply-To
<87pmhjm1fs.fsf@protesilaos.com> (view parent)
DKIM signature
pass
Download raw message
Hello again,

On 2022-08-02 12:36, Protesilaos Stavrou <info@protesilaos.com> writes:

> I am reading it the same way you did.  This code would have also failed
> with the space-separated tags.
>
> I have thought again about what I wrote and I am thinking we should not
> add any code to Denote to handle the enforcement of the correct
> characters in Org's filetags.  Org should be enforcing it instead.  For
> example, there could be a function that runs when the user saves the
> file, which warns that the filetags have inappropriate characters.
>
> By adding such code to Denote, we would be broadening its scope which
> eventually adds to the maintenance burden.  Also, we would be doing a
> disservice to other users who are writing their own filetags
> incorrectly.  Whereas if Org was to handle this, every user would be
> covered.  And if the Org maintainers do not think this is necessary, why
> should we?
>
> What do you think?

Yes, I think this is the best approach. For headlines, there is some
enforcement by org (I tried adding the tag "bad-tag" to a headline using
"C-c C-c", and I got ":bad:tag:"), and syntax highlighting does not work
for such tags. But this is not about file tags (which do not seem to be
fontified), only headline tags.

I just searched for "filetags" in the org source, and there are not many
occurrences of it. One interesting buffer-local variable is
"org-file-tags", which gets populated with the local file tags when
opening a file. With this line:

#+filetags:   denote  org-roam

I get the expected value:

(#("denote" 0 6
   (inherited t))
 #("org-roam" 0 8
   (inherited t)))

So I agree letting org do its own enforcement is the best move.

Best,

Alan
Details
Message ID
<875yj0tnx2.fsf@protesilaos.com>
In-Reply-To
<87pmhjm1fs.fsf@protesilaos.com> (view parent)
DKIM signature
pass
Download raw message
Hello again Alan (and folks following the list),

> From: Protesilaos Stavrou <info@protesilaos.com>
> Date: Tue, 02 Aug 2022 12:36:55 +0300

>>> Perhaps we could provide something like this (PROOF OF CONCEPT) that the
>>> user could run on a bunch of buffers to auto-update the front matter.
>>
>> I think such a function would be very convenient.
>
> Okay.  We will have one ready by the time of the new release.

This is about the migration from the old space-separated filetags to the
new colon-separated format.  As promised, there is a supported way to do
this.  I just added the following to denote.el and will now prepare the
release of version 0.5.0.  Unless something extraordinary happens,
expect the release notes tonight (local time is 13h).


    ;;;; For the migration of old Org filetags

    (defun denote--migrate-org-files ()
      "Return list of Org files in variable `denote-directory'."
      (seq-remove
       (lambda (file)
         (not (string= (file-name-extension file) "org")))
       (denote--directory-files)))

    ;;;###autoload
    (defun denote-migrate-old-org-filetags ()
      "Rewrite Org filetags' value as colon-separated.

    Change the filetags from:

        #+filetags:   one  two

    To the standard format of:

        #+filetags:  :one:two:

    A single tags chnages from TAG to :TAG:.

    Denote used to format filetags with two spaces between them, but
    this is not fully supported by Org.  The colon-separated entries
    are the rule.

    The rewrite DOES NOT SAVE BUFFERS.  The user is expected to
    review the changes, such as by using `diff-buffer-with-file'.
    Multiple buffers can be saved with `save-some-buffers' (check its
    doc string).

    This command is provided for the convenience of the user.  It
    shall be deprecated and eventually removed from future versions
    of Denote.  Written on 2022-08-10 for version 0.5.0."
      (interactive)
      (when-let (((yes-or-no-p "Rewrite filetags in Org files to use colons (buffers are NOT saved)?"))
                 (files (denote--migrate-org-files)))
        (dolist (file files)
          (when-let* ((kw (denote--front-matter-keywords-to-list file))
                      ((denote--edit-front-matter-p file)))
            (denote--rewrite-keywords file kw)))))


I tested this code with my actual notes.  It all worked fine.  PLEASE
MAKE BACKUPS before performing any batch rewrites.  Or at least track
your notes with Git (or equivalent).  The command does not save buffers
as a precaution to guard against data loss.

That's all for the time being.  Now I shall write the change log.  It
will be very long...

All the best,
Prot

-- 
Protesilaos Stavrou
https://protesilaos.com
Details
Message ID
<87a67wu7be.fsf@m4x.org>
In-Reply-To
<875yj0tnx2.fsf@protesilaos.com> (view parent)
DKIM signature
pass
Download raw message
Hello Prot,

(Sorry for the delay, I was in vacations with very little internet
access.)

On 2022-08-10 13:04, Protesilaos Stavrou <info@protesilaos.com> writes:

> I tested this code with my actual notes.  It all worked fine.  PLEASE
> MAKE BACKUPS before performing any batch rewrites.  Or at least track
> your notes with Git (or equivalent).  The command does not save buffers
> as a precaution to guard against data loss.

I just tried it and it works great. Thanks!

Best,

Alan
Details
Message ID
<87a67wbxo8.fsf@protesilaos.com>
In-Reply-To
<87a67wu7be.fsf@m4x.org> (view parent)
DKIM signature
pass
Download raw message
> From: Alan Schmitt <alan.schmitt@polytechnique.org>
> Date: Mon, 22 Aug 2022 08:21:41 +0200
>
> Hello Prot,

Hello Alan,

> (Sorry for the delay, I was in vacations with very little internet
> access.)

No need to apologise: you did nothing wrong.

> On 2022-08-10 13:04, Protesilaos Stavrou <info@protesilaos.com> writes:
>
>> I tested this code with my actual notes.  It all worked fine.  PLEASE
>> MAKE BACKUPS before performing any batch rewrites.  Or at least track
>> your notes with Git (or equivalent).  The command does not save buffers
>> as a precaution to guard against data loss.
>
> I just tried it and it works great. Thanks!

Very well!

Thanks for your time,
Prot

-- 
Protesilaos Stavrou
https://protesilaos.com
Reply to thread Export thread (mbox)