Let’s say I get some trash markup like
[source,html]
----
<div>
<span><span class="useless">Hello</span></span>
</div>
----
What is the best way to remove the `.useless` element?
[source,html]
----
<div>
<span>Hello</span>
</div>
----
I could `HTML.select(page, "span > span.useless")` and then grab its inner_html, remove the child, and then spit the contents into the parent, but something about it seems off—particularly with text nodes. There is a `wrap` widget but not `unwrap`. This would beg the question: what if there are more contents in the parent? Remove all contents and replace with the selected element or skip if there are other elements and its not purely a uselessly wrapped element.
Follow up question: is this something that would see enough demand to warrant extending the HTML API (and widget?)?
--
toastal ไข่ดาว | https://toast.al
PGP: 7944 74b7 d236 dab9 c9ef e7f9 5cce 6f14 66d4 7c9e
Hi Toastal,
>but something about it seems off—particularly with text nodes
Could you clarify, does HTML.children not consider text nodes children
in that case?
From a quick test (direct with lambdasoup), it should work as expected.
top # Soup.parse ("<p>foo <span>bar</span> baz</p>") |> Soup.select_one
"p" |> Option.get |> Soup.children |> Soup.to_list |> List.map
Soup.to_string ;;
- : string list = ["foo "; "<span>bar</span>"; " baz"]
Or it's something else that is wrong?
>what if there are more contents in the parent?
>Remove all contents and replace with the selected element or skip if
there are other elements and its not purely a uselessly wrapped element.
I think the real issue here is that the answer will be different for
different people.
Perhaps an "unwrap" plugin should have an option to check the child
count, e.g. max_children, by default set to 1.
>is this something that would see enough demand to warrant extending
the HTML API (and widget?)
I've been quite reluctant to add new built-in widget ever since the Lua
API was implemented.
Since you can't easily remove widgets or change their behavior, I think
they should always start their life as plugins,
and if the demand is high and the design seems right, be converted to
built-ins.
Adding new functions to the HTML API is less of a problem if they are
granular and their design is uncontroversial,
but I'm not sure yet is unwrap functionality is uncontroversial.
I think it should be a plugin at first, I'll happily add it to the
plugin list if you send it to me.
On 11/16/21 22:29, toastal wrote:
> Let’s say I get some trash markup like
>
> [source,html]
> ----
> <div>
> <span><span class="useless">Hello</span></span>
> </div>
> ----
>
> What is the best way to remove the `.useless` element?
>
> [source,html]
> ----
> <div>
> <span>Hello</span>
> </div>
> ----
>
> I could `HTML.select(page, "span > span.useless")` and then grab its inner_html, remove the child, and then spit the contents into the parent, but something about it seems off—particularly with text nodes. There is a `wrap` widget but not `unwrap`. This would beg the question: what if there are more contents in the parent? Remove all contents and replace with the selected element or skip if there are other elements and its not purely a uselessly wrapped element.
>
> Follow up question: is this something that would see enough demand to warrant extending the HTML API (and widget?)?
>