~dmbaturin/soupault

6 2

Neighboring Elements HTML.functions returns error

Details
Message ID
<7bc1cada-3cbc-73e6-260b-0790e2323f9b@aoirthoir.com>
DKIM signature
pass
Download raw message
I cannot get these neighboring loops to work... from the documentation:

  This family of functions provides access to element's neighboring 
elements in the tree. They all return a (possibly empty) list.

     HTML.children
     HTML.ancestors
     HTML.descendants
     HTML.siblings

Example: add class="silly-class" to every element inside the page <body>.

body = HTML.select_one(page, "body")
children = HTML.children(body)

local i = 1
while children[i] do
   HTML.add_class(children[i], "silly-class")
   i = i + 1
end

Running this code exactly on a new soupault setup, with no plugins 
running, no libraries activated gives me this error:
[ERROR] Could not process page site/index.html: Expected an element, but 
found a document

Changing the code to this works:

body     = HTML.select_one(page, "body")
children = HTML.select(body,'*')
local i = 1
while children[i] do
   HTML.add_class(children[i], "silly-class")
   i = i + 1
end

Which is fine but, getting the siblings becomes difficult... I will play 
with my own loop through the elements of the page for what I am trying 
to accomplish:

h6
p
p

becomes:

div
h6
p
p
/div

Just wanted to report this issue ...
Details
Message ID
<c725651c-e0ed-5bde-f38e-b14878be663c@baturin.org>
In-Reply-To
<7bc1cada-3cbc-73e6-260b-0790e2323f9b@aoirthoir.com> (view parent)
DKIM signature
pass
Download raw message
>children = HTML.children(body)
>
>local i = 1
>while children[i] do
>  HTML.add_class(children[i], "silly-class")
>  i = i + 1
>end
>Running this code exactly on a new soupault setup, with no plugins
running, no libraries activated gives me this error:
>[ERROR] Could not process page site/index.html: Expected an element,
but found a document

This is a somewhat confusing error message indeed. What it really means:
"expected an element but found a thing that may not be one".
I'll fix that wording.

Now, to the real issue... A parse tree of an HTML document contains two
distinct types of nodes: elements and text nodes.

HTML.children and friends return all nodes that qualify as
children/siblings/etc. _including text nodes_.
At the same time, HTML.add_class require a node that is guaranteed to be
an element.

I see two possible ways out. The first is to add a new family of
functions like HTML.child_elements, HTML.sibling_elements etc. that
return only nodes that are elements rather than text.

The second option is to add a function like HTML.to_element. Since Lua
(2.5) has no exception handling, its behaviour is a serious question.
One option is to make it return nil for values that aren't elements.
Another option is to also add HTML.is_element and make HTML.to_element
fail plugin execution when called on a non-element node. Not sure which
one is preferrable.

Finally, a third option is to make HTML.add_class silently ignore values
that aren't element nodes (possibly emit a log message in that case to
aid with debugging).
Details
Message ID
<d3cf0ed7-0f43-d257-399e-5ba56f82ee0a@aoirthoir.com>
In-Reply-To
<c725651c-e0ed-5bde-f38e-b14878be663c@baturin.org> (view parent)
DKIM signature
pass
Download raw message
I think the best option would be an is_element test... so we can just 
continue through our loops. knowing which kind of data we are dealing 
with gives us options that just altering the add_class would not give 
us. As that would only work on add_class. Whereas in my case i was doing 
something else entirely and only used the add_class since it was the 
example code.

on the bright side i did finish my plugin which is called 
block-maker.lua despite the appearance of restrictions with lua, so far 
everything I have ever tried to do, i found a way to accomplish with 
lua-ml and soupault. next step is going to be a create_gallery.lua 
plugin.. that will search a directory and create img elements for all 
the images there. That, is for tomorrow...
Details
Message ID
<bb3b061c-9439-233b-d84c-0526eae3c7b2@baturin.org>
In-Reply-To
<d3cf0ed7-0f43-d257-399e-5ba56f82ee0a@aoirthoir.com> (view parent)
DKIM signature
pass
Download raw message
On 4/27/20 6:02 AM, Aoirthoir An Broc wrote:
> I think the best option would be an is_element test... so we can just
> continue through our loops. knowing which kind of data we are dealing
> with gives us options that just altering the add_class would not give
> us. As that would only work on add_class. Whereas in my case i was
> doing something else entirely and only used the add_class since it was
> the example code.
>
> on the bright side i did finish my plugin which is called
> block-maker.lua despite the appearance of restrictions with lua, so
> far everything I have ever tried to do, i found a way to accomplish
> with lua-ml and soupault. next step is going to be a
> create_gallery.lua plugin.. that will search a directory and create
> img elements for all the images there. That, is for tomorrow...
>
>
Myself I'm leaning towards making all functions that only make sense for
element nodes ignore any other values.
Either silently, or with a debug log message.

My reasoning is that receiving a text node instead of an element usually
isn't a _programming mistake_, and there is no harm in not doing a thing
with them since operations like add_class or remote_attribute makes no
sense for them anyway.
Thus, making those functions work as a no-op is the least frustrating thing.

That said, HTML.is_element can still be useful and there's no reason not
to do both. People may want to treat element and text nodes differently
in their code for multiple reasons.
Details
Message ID
<f8251b6f-481d-57f8-73a5-5b3a7b55ac84@aoirthoir.com>
In-Reply-To
<c725651c-e0ed-5bde-f38e-b14878be663c@baturin.org> (view parent)
DKIM signature
pass
Download raw message
That all sounds reasonable. Especially since much of the time we are 
going to just work on elements anyhow...

Maybe in addition to HTML.is_element, there could be HTML.is_document, 
HTML.is_text and so on... also HTML.node_type...
Details
Message ID
<e9224de3-99a2-8413-0a08-951b5ae61a4d@baturin.org>
In-Reply-To
<d3cf0ed7-0f43-d257-399e-5ba56f82ee0a@aoirthoir.com> (view parent)
DKIM signature
pass
Download raw message
On 4/27/20 6:02 AM, Aoirthoir An Broc wrote:
> I think the best option would be an is_element test... so we can just
> continue through our loops. knowing which kind of data we are dealing
> with gives us options that just altering the add_class would not give
> us. As that would only work on add_class. Whereas in my case i was
> doing something else entirely and only used the add_class since it was
> the example code.
>
Ok, making HTML.add_class etc. work as a no-op for non-element nodes
turned out a bit harder than I thought. It will still fail for document
and text nodes—with a better error message than before.
However, it now works correctly with element nodes returned by
HTML.children and friends.

You just need to check that it's actually an element node with
HTML.is_element

Sample plugin that adds a class to all children of a certain element:

----------------
selector = config["selector"]
class = config["class"]

container = HTML.select_one(page, selector)

elems = HTML.children(container)
count = size(elems)

local n = 1
while (n <= count) do
  elem = elems[n]
  if HTML.is_element(elem) then
    HTML.add_class(elem, class)
  end

  n = n + 1
end
----------------

HTML.is_document and HTML.is_text apparently can't be implemented right
now, but I may send a patch to the lambdasoup maintainer at some point,
if we find convincing use cases for those.

I'll rebuild the binaries and make a release this week.
Details
Message ID
<38d323da-5423-eb5d-49b9-275bccd8f479@aoirthoir.com>
In-Reply-To
<c725651c-e0ed-5bde-f38e-b14878be663c@baturin.org> (view parent)
DKIM signature
pass
Download raw message
Excellent thank you.

I am not sure the use cases for the others. I suspect that I will almost 
entirely be working with element nodes myself. Browser side I understand 
the need for such nodes in javascript but soupault side, I cannot think 
yet. If it comes to it, we can figure it out then. otherwise 
HTML.inner_html will be what we need most times anyhow. I'll upgrade as 
soon as the new release is out and retest.
Export thread (mbox)