The introduction of frames means that data is no longer tied to a specific Page
or Context. 255b9a91cc introduced Origins for
v8 values shared across frames of the same origin. The commit highlighted the
lifetime mismatched that we now have with data that can outlive 1 frame. A
specific issue with that commit was the finalizers were still Context-owned.
But like any other piece of data, that isn't right; aside from modules, nothing
should be context-owned.
This commit continues where the last left off and moves finalizers from Context
to Origin. This is done in a separate commit because it introduces significant
changes. Currently, finalizers take a *Page, but that's no longer correct. A
value created in one Page, can outlive the Page. We need another container. I
original thought to use Origin, but that isn't know to CDP/MCP. Instead, I
decide to enhance the Session.
Session is now the owner of the page.arena, the page.factory and the
page.arena_pool. Finalizers are given a *Session which they can use to release
their arena.
After looking at a handful of websites, the # of Text and Commend nodes
that are small (<= 12 bytes) is _really_ high. Ranging from 85% to 98%. I
thought that was high, but a lot of it is indentation or a sentence that's
broken down into multiple nodes, eg:
<div><b>sale!</b> <span class=price>$1.99</span> buy now<div>
So what looks like 1 sentence to us, is actually 3 text nodes.
On a typical website, we should see thousands of fewer allocations in the
page arena for the text in text nodes.
Most importantly, this allows the Selector.List to be self-contained with
an arena from the ArenaPool. Selector.List can be both relatively large
and relatively common, so moving it off the page.arena is a nice win.
Also applied this to ChildNodes, which is much smaller but could also be
called often.
I was initially going to hook into the v8::Object's internal fields to store
the referencing v8::Object. So the v8::Object representing the Iterator
would store the v8::Object representing the NodeList inside of its internal
field - which the GC would trace/detect/respect. And that is probably the
fastest and most v8-ish solution, but I couldn't come up with an elegant
solution. The best I had was having a "afterCreate" callback which passed
the v8 object (this is similar to the old postAttach callback we had, but
used for a different purpose). However, since "acquireRef" was recently
added to events, re-using that was much simpler and worked well.
We're seeing an assertion in Page.appendNew fail because the node has a parent.
According to html5ever, this shouldn't be possible (appendNew is only called
from the Parser). BUT, it's possible we're mutating the node in a way that
we shouldn't...maybe there's JavaScript executing as we're parsing which is
mutating the node.
In release, this will be more defensive. In debug, this still crashes. It's
possible this is valid (like I said, maybe there's JS interleaved which is
mutating the node), but if so, I'd like to know the exact scenario that produces
this case.
Avoids having to allocate small strings when going from v8 -> Zig. Also
added a discriminatory type, string.Global which uses the arena, rather than
the call_arena, if an allocation _is_ necessary. (This is similar to a feature
we had before, but was lost in zigdom). Strings from v8 that need to be
persisted, can be allocated directly v8 -> arena, rather than v8 -> call_arena
-> arena.
I think there are a lot of places where we should use string.String - where
strings are expected to be short (e.g. attribute names). But started with just
document.querySelector and querySelectorAll.
chromedp expects the nodeId starts to 1.
A start to 0 make it enter in infinite loop b/c it expects the Go's
default int, ie 0, to be nil from a map to stop the loop.
If the 0 index is set, it will loop...
Following Zig recommendation not to inline except in specific cases, none of
which I think applies to use.
Also, mimalloc.create can't fail (it used to be possible, but that changed a
while ago), so removed its error return.
There is some risk to this change. The first is that I made a mistake. The
other is that one of the APIs that doesn't currently return an error changes
in the future.
The thin mimalloc API is currently defensive around incorrect setup/teardown by
guarding against using/destroying the arena when the heap is null, or creating
an arena when it already exists.
The only time these checks will fail is when the code is wrong, e.g. trying
to use libdom before or after freeing the arena. The current behavior can mask
these errors, plus add runtime overhead.
Outputs in logfmt in release and a "pretty" print in debug mode. The format
along with the log level will become arguments to the binary at some point in
the future.
Node registry now only tracks the node id (which we need to be consistent) and
the underlying parser.Node. All other data is loaded on-demand (i.e. when we
serialize the node). This allows us to serialize node values as they appear
when they are serialized, as opposed to when they are registered.
This expands on the existing CDP node work used in DOM.search. It introduces
a node registry to track all nodes returned to the client and give lookups to
get a node from a Id or a *parser.node.
Eventually, the goal is to have the Registry emit the DOM.setChildNodes event
whenever necessary, as well as support many of the missing DOM actions.
Added tests to existing search handlers. Reworked search a little bit to avoid
some unnecessary allocations and to hook it into the registry.
The generated Node is currently incomplete. The parentId is missing, the
children are missing. Also, we still need to associate the v8 ObjectId to the
node.
Finally, I moved all action handlers into a nested "domain" folder.