The reallocation_count counter was being incremented regardless of whether
the resize/remap operations succeeded. This led to inaccurate memory
allocation statistics.
- resize: Only increment when rawResize returns true (success)
- remap: Only increment when rawRemap returns non-null (success)
This fixes the TODO comments that were present in the code.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 - When Target.setAutoAttach is called, send the `Target.attachedToTarget`
event before sending the response. This matches Chrome's behavior and
it stops playwright from thinking there's no target and making extra calls,
e.g. to Target.attachedToTarget.
2 - Use the same dummy frameId for all startup messages. I'm not sure why we
have STARTUP-P and STARTUP-B. Using the same frame (a) makes more sense to
me (b) doesn't break any existing integration tests, and (c) improves this
scenario: https://github.com/lightpanda-io/browser/issues/1800
Dynamic scripts have script.async == true by default (we handled this correctly
in the ScriptManager, but we didn't return the right value when .async was
accessed).
Inline scripts only consider direct children, not the entire tree.
Empty inline scripts are executed at a later time if text is inserted into them
Loading `sub 1.html` has a side effect - it increments window.top..sub1_count).
So it should be used careful. It was being used in `about_blank_renavigate` as
a placeholder which _should_ not get navigated, but there's no strict guarantee
about when it gets canceled.
In https://github.com/lightpanda-io/browser/pull/1774 we started to track Ranges
in the page in order to correctly make them "live". But, without correct
lifetime, they would continue to be "live" even if out of scope in JS.
This commit adds finalizers to Range via reference counting similar to Events.
It _is_ possible for a Range to outlive its page, so we can't just remove the
range from the Page's _live_range list - the page might not be valid. This
commit gives every page an unique id and the ability to try and get the page
by id from the session. By capturing the page_id at creation-time, a Range
can defensively remove itself from the page's list. If the page is already
gone, then there's no need to do anything.
Next.js hydration references FileList as a global for feature detection.
Register a minimal stub (length=0, item()→null) so the type exists in
the global scope and the reference check doesn't throw.
Custom element callbacks aren't allowed to call document.open/close/write while
parsing.
Fixes WPT crash:
/custom-elements/throw-on-dynamic-markup-insertion-counter-reactions.html
On BSD, a listening socket isn't considered connected, so this error is
expected. Maybe we shouldn't call shutdown at all, but I guess it's safer this
way.
The bridge.finalizer resolves Page through its own module graph, which
can differ from Range.zig's import in release builds. Store a pointer
to the live_ranges list directly on AbstractRange so deinit can remove
without accessing Page fields.
Add a weak finalizer to Range that removes its linked list node from
Page._live_ranges when V8 garbage-collects the JS Range object. This
prevents the list from growing unboundedly and avoids iterating over
stale entries during mutation updates.
Move the per-range update logic from Page into AbstractRange methods
(updateForCharacterDataReplace, updateForSplitText, updateForNodeInsertion,
updateForNodeRemoval). Page now just iterates the list and delegates.
Remove redundant start_container == end_container check in insertNode —
collapsed already implies same container.
When calculating accessible names for elements without explicit labels, multiple descendant text nodes were previously concatenated directly together. This adds a space between distinct text node contents to prevent words from sticking together.
Previously, semantic_tree_text hardcoded prune = false, which bypassed the structural node filters and allowed empty none nodes to pollute the root of the text dump.
This prevents token bloat in JSON/text dumps and ensures that StaticText leaf nodes are not incorrectly pruned when structural containers (like none, table) hoist their text.
This fixes a bug where only the first text node was being returned, causing fragmented text nodes (e.g. <span>Sub</span><span>mit</span>) to be missing their trailing text.
The introduction of frames means that data is no longer tied to a specific Page
or Context. 255b9a91cc introduced Origins for
v8 values shared across frames of the same origin. The commit highlighted the
lifetime mismatched that we now have with data that can outlive 1 frame. A
specific issue with that commit was the finalizers were still Context-owned.
But like any other piece of data, that isn't right; aside from modules, nothing
should be context-owned.
This commit continues where the last left off and moves finalizers from Context
to Origin. This is done in a separate commit because it introduces significant
changes. Currently, finalizers take a *Page, but that's no longer correct. A
value created in one Page, can outlive the Page. We need another container. I
original thought to use Origin, but that isn't know to CDP/MCP. Instead, I
decide to enhance the Session.
Session is now the owner of the page.arena, the page.factory and the
page.arena_pool. Finalizers are given a *Session which they can use to release
their arena.
Depends on: https://github.com/lightpanda-io/zig-v8-fork/pull/153
In some ways this is an extension of
https://github.com/lightpanda-io/browser/pull/1635 but it has more implications
with respect to correctness.
A js.Context wraps a v8::Context. One of the important thing it adds is the
identity_map so that, given a Zig instance we always return the same v8::Object.
But imagine code running in a frame. This frame has its own Context, and thus
its own identity_map. What happens when that frame does:
```js
window.top.frame_loaded = true;
```
From Zig's point of view, `Window.getTop` will return the correct Zig instance.
It will return the *Window references by the "root" page. When that instance is
passed to the bridge, we'll look for the v8::Object in the Context's
`identity_map` but wont' find it. The mapping exists in the root context
`identity_map`, but not within this frame. So we create a new v8::Object and now
our 1 zig instance has N v8::Objects for every page/frame that tries to access
it.
This breaks cross-frame scripting which should work, at least to some degree,
even when frames are on the same origin.
This commit adds a `js.Origin` which contains the `identity_map`, along with our
other `v8::Global` storage. The `Env` now contains a `*js.Origin` lookup,
mapping an origin string (e.g. lightpanda.io:443) to an *Origin. When a Page's
URL is changed, we call `self.js.setOrigin(new_url)` which will then either get
or create an origin from the Env's origin lookup map.
js.Origin is reference counted so that it remains valid so long as at least 1
frame references them.
There's some special handling for null-origins (i.e. about:blank). At the root,
null origins get a distinct/isolated Origin. For a frame, the parent's origin
is used.
Above, we talked about `identity_map`, but a `js.Context` has 8 other fields
to track v8 values, e.g. `global_objects`, `global_functions`,
`global_values_temp`, etc. These all must be shared by frames on the same
origin. So all of these have also been moved to js.Origin. They've also been
merged so that we now have 3 fields: `identity_map`, `globals` and `temps`.
Finally, when the origin of a context is changed, we set the v8::Context's
SecurityToken (to that origin). This is a key part of how v8 allows cross-
context access.
Per DOM spec, appendData(data) is defined as replaceData(length, 0, data).
While the range update would be a no-op (offset=length, count=0), routing
through replaceData ensures consistent code path and spec compliance.
Per DOM spec, all live ranges must have their boundary offsets updated
when CharacterData content changes (insertData, deleteData, replaceData,
splitText) or when nodes are inserted/removed from the tree.
Track live ranges via an intrusive linked list on Page. After each
mutation, iterate and adjust start/end offsets per the spec algorithms.
Also fix Range.deleteContents loop that read _end_offset on each
iteration (now decremented by the range update), and Range.insertNode
that double-incremented _end_offset for non-collapsed ranges.
Route textContent, nodeValue, and data setters through replaceData
so range updates fire consistently.
Fixes 9 WPT test files (all now 100%): Range-mutations-insertData,
deleteData, replaceData, splitText, appendChild, insertBefore,
removeChild, appendData, dataChange (~1330 new passing subtests).
- Remove scale, contain-intrinsic-size, animation-range, text-box-edge
from isTwoValueShorthand: these have asymmetric or 3-value semantics
that make "X X" → "X" collapse incorrect.
- Remove line-height from isLengthProperty: bare 0 is the unitless
number multiplier, not a length (Chrome serializes as "0" not "0px").
- Fix test: background-size "cover cover" is invalid CSS, use "auto auto".
Add missing properties to isLengthProperty (0→0px) and
isTwoValueShorthand (duplicate value collapse) maps based
on WPT test failures in css/css-sizing, css/css-align,
css/css-scroll-snap, css/css-logical, and others.
New length properties: scroll-margin/padding-*, column-width,
column-rule-width, grid-column/row-gap, outline, shape-margin,
offset-distance, translate, animation-range-*, block-step-size,
text-decoration-inset, and *-rule-*-inset (CSS Gaps).
New two-value shorthands: scroll-padding-block/inline,
scroll-snap-align, background-size, border-image-repeat,
mask-repeat/size, contain-intrinsic-size, scale, text-box-edge,
animation-range, grid-gap.
Calling it here ensures that the inspector gets reset on internal page
navigation. We were seeing intermittent segfaults on a problematic WPT tests
(/encoding/legacy-mb-japanese/euc-jp/) which I believe this solves.
(The tests are still broken. Because we don't support form targets, they cause
the root page to reload in a tight cycle, causing a lot of context creation /
destruction, which I thin was the issue. This commit doesn't fix the broken test
but it hopefully fixes the crash).
Also, clear out the Inspector's default_context when the default context is
destroyed. (This was the first thing I did to try to fix the crash, it didn't
work, but I believe it's correct).
- TreeWalker.Full instead of FullExcludeSelf so querying a specific
nodeId evaluates the root element itself
- resolve href to absolute URL via URL.resolve
- isDisabled checks ancestor <fieldset disabled> with legend exemption
- parameter order: allocator before *Page per convention
Allows checking if a direct listener exists, if it doesn't, event creation can
be skipped.
I looked at a couple sites, the benefits of this is small.
Most sites don't seem to trigger that many direct dispatches and when they do,
they seem to have a listener 50-75% of the time.
- Use `checkVisibility` for more accurate element visibility detection.
- Add support for color, date, file, and month AX roles.
- Optimize XPath generation by tracking sibling indices during the walk.
- Refine interactivity detection for form elements.
Returns a structured list of all interactive elements on a page:
buttons, links, inputs, ARIA widgets, contenteditable regions, and
elements with event listeners. Includes accessible names, roles,
listener types, and key attributes.
Event listener introspection (both addEventListener and inline
handlers) is unique to LP — no other browser exposes this to
automation code.
- Use a reusable buffer for XPaths to reduce allocations.
- Improve `display: none` detection with proper CSS parsing.
- Pass parent name to children to avoid redundant AXNode lookups.
- Use `getElementById` for faster datalist lookups.
- Add more structural roles (banner, navigation, main, list, etc.).
- Implement fallback for accessible names (SVG titles, image alt text).
- Skip children for leaf-like semantic nodes to reduce redundancy.
- Disable pruning in the default semantic tree view.
Instead of always returning zeros, delegate getBoundingClientRect and
getClientRects to the common ancestor container element. Return zeros
only when the range is collapsed or has no element ancestor.
When a module's compilation fails after its imported_modules entry has
been consumed by waitForImport, sibling modules that import the same
dependency would get UnknownModule errors. Fix by re-preloading modules
whose cache entry exists but has no compiled module.
A inner-navigate event can override an existing pending queued navigation. When
it does, the previously queued navigation has to be cleaned up. We were doing
this, but it must have been stripped out when navigation was refactored to work
with frames.
headless stubs returning zero-valued DOMRect / empty list per CSSOM
View spec. fixes "getBoundingClientRect is not a function" errors on
sites where layout code calls this on Range objects (e.g. airbnb).
headless stub for the FontFace API — constructor stores family/source,
status is always "loaded", load() resolves immediately. enables sites
that use new FontFace() for programmatic font loading (e.g. boursorama).
Scripts created via createElement('script') with inline content (textContent,
text, or innerHTML) and inserted into the DOM now execute per the HTML spec.
Previously all dynamically inserted scripts without a src attribute were
skipped, breaking most JS framework hydration patterns.
atob, Performance.measure, and Navigation methods (back, forward,
navigate, traverseTo, updateCurrentEntry) return DOMException errors
but were missing the dom_exception flag, causing them to throw generic
Error objects instead of proper DOMException instances in JavaScript.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
It isn't safe/correct to call `navigate` on the same page multiple times. A page
is meant to have 1 navigate call. The caller should either remove the page
and create a new one, or call Session.replacePage.
This commit removes the *Page from the MCP Server and instead interacts with
the session to create or remove+create the page as needed, and lets the Session
own the *Page.
It also adds a bit of defensiveness around parameter parsing, e.g. calling
{"method": "tools/call"} (without an id) now errors instead of crashing.
This is a follow up to ca0f77bdee that applies
the same fix to all places where cloneNode is used and then the cloned element
is inserted. A helper was added more of a means of documentation than to DRY
up the code.
Revert "update ref counting for new ReadableStream usages"
This reverts commit c64500dd85.
Revert "add reference counting for ReadableStream"
This reverts commit 812ad3f49e.
Revert "use a pool arena with ReadableStream"
This reverts commit 8e8a1a7541.
Instead of going through the parser, just create / append the 3 elements.
iframe without a src automatically loads about:blank. This is important, because
the following is valid:
```js
const iframe = document.createElement('iframe');
document.documentElement.appendChild(iframe);
// documentElement should exist and should be the HTML of the blank page.
iframe.contentDocument.documentElement.appendChild(...);
```
Builds on top of https://github.com/lightpanda-io/browser/pull/1720
All inner navigations have an originator and a target. Consider this:
```js
aframe.contentDocument.querySelector('#link').click();
```
The originator is the context in which this JavaScript is called, the target is
`aframe. Importantly, relative URLs are resolved based on the originator. This
commit adds that.
This is only a first step, there are other aspect to this relationship that
isn't addressed yet, like differences in behavior if the originator and target
are on different origins, and specific target targetting via the things like
the "target" attribute. What this commit does though is address the normal /
common case.
It builds on top of https://github.com/lightpanda-io/browser/pull/1720
Same pattern as 3dea554e (mcp/Server.zig): allocator.create returns
undefined memory, and struct field defaults (shutdown: bool = false)
are not applied when fields are set individually. Use self.* = .{...}
to ensure all fields including defaults are initialized.
* Refactored tag ignoring logic to use the el.getTag() enum switch
instead of string comparisons, improving performance and safety.
* Replaced string comparisons for interactive roles with
std.StaticStringMap.
* Renamed internal dumpNode method to dump for brevity.
In https://github.com/lightpanda-io/browser/pull/1651 we started to run the
message loop a lot more. One specific case we added for `fetch` was when there
were no scheduled tasks or HTTP, but background tasks, we'd wait for them to
complete.
One case we missed though is if WE do have a schedule tasks, but it's too far
into the future. In that case, we would just exit. This now adds the same logic
for checking and waiting for any background tasks in that case.
Must of the complexity in the previous commit had to do with the fact that
about:blank is processed synchronously, meaning that we could process a
scheduled navigation -> page.navigate -> scheduled navigation:
```
let iframe = document.createElement('iframe');
iframe.addEventListner('load', () => {
iframe.src = "about:blank";
});
```
This is an infinite loop which is going to be a problem no mater what, but there
are different degrees of problems this can cause, e.g. looping forever vs use-
after-free or other undefined behavior.
The new approach does 2 passes through scheduled navigations, first processing
"asynchronous" navigation (anything not "about:blank"), then processing
synchronous navigation ("about:blank"). The main advantage is that if the
synchronous navigation causes more synchronous navigation, it won't be
processed until the next tick. PLUS, we can detect about:blank that loads
about:blank and stop it (which might not be to spec, but seems right to do
nonetheless). This 2-pass approach removes the need for a couple of checks and
makes everything else simpler.
It relies on default field values, e.g. for mutex: std.Thread.Mutex = .{}, but
doesn't initialize the structure, just the pointer on the heap resulting in a
crash.
This makes frame sub-navigation "work" for all page navigations (click, form
submit, location.top...) as well as setting the iframe.src.
Fixes at least 2 WPT crashes.
BUT, the implementation still isn't 100% correct, with two known issues:
1. Navigation currently happens in the context where it's called, not the
context of the frame. So if Page1 accesses Frame1 and causes it to navigate,
e.g. f1.contentDocument.querySelector('#link').click(), it's Page1 that will
be navigated, since the JS is being executed in the Page1 context.
This should be relatively easy to fix.
2. There are particularly complicated cases in WPT where a frame is navigated
inside of its own load, creating an endless loop. There's some partial
support for this as-is, but it doesn't work correctly and it currently is
defensive and likely will not continue to navigate. This is particularly true
when sub-navigation is done to about:blank within the frame's on load event.
(Which is probably not a real concern, but an issue for some WPT tests)
Although it shares a lot with the original navigation code, there are many more
edge cases here, possibly due to being developed along side WPT tests. The
source of most of the complexity is the synchronous handling of "about:blank"
in page.navigate, which can result in a scheduled navigation synchronously
causing more scheduled navigation. (Specifically because
`self.documentIsComplete();` is called from page.navigate in that case). It
might be worth seeing if something can be done about that, to simplify this new
code (removing the double queue, removing the flag, simplifying pre-existing
schedule checks ,...)
When we createElement, we assume the element is detached. This is usually true
except for Custom Elements where the constructor can do anything, including
connecting the element. This broken assumption results in cloneNode crashing.
https://github.com/lightpanda-io/browser/pull/1690 narrowed the lifetime of
HandleScopes to once per listener. I think that was just an accident of
refactoring, and not some intentional choice.
The narrower HandleScope lifetime makes it so that when we do run microtask
queue at the end of event dispatching, some locals in the queue may not longer
be valid.
HS1
HS2
queueMicrotask(func)
runMicrotask
In the above flow, `func` is only valid while HS2 is alive, so when we run
the microtask queue in HS1, it is no longer valid.
Playwright, when creating a new CDPSession, sends an
attachToBrowserTarget followed by another attachToTarget to re-attach
itself to the existing target.
see playwright/axtree.js from demo/ repository.
advance() asserts that each byte it steps over is either an ASCII byte
or a UTF-8 sequence leader, never a continuation byte (0x80–0xBF).
consumeName() was calling advance(1) for all non-ASCII bytes
('\x80'...'\xFF'), processing multi-byte sequences one byte at a time.
For a two-byte sequence like é (0xC3 0xA9), the second iteration landed
on the continuation byte 0xA9 and triggered the assertion, crashing the
browser in Debug mode.
Fix: replace advance(1) with consumeChar() for all non-ASCII bytes.
consumeChar() reads the lead byte, derives the sequence length via
utf8ByteSequenceLength, and advances the full code point in one step,
so the position never rests on a continuation byte.
Observed on saintcyrlecole.caliceo.com, whose root element carries an
inline style with custom property names containing French accented
characters (--color-store-bulles-été-fg, etc.). The crash aborted JS
execution before the Angular app could render any dynamic content.
Most significantly, if removing from the multi fails, the connection
is added to a "dirty" list for the removal to be retried later. Looking at
the curl source code, remove fails on a recursive call, and we've struggled with
recursive calls before, so I _think_ this might be happening (it fails in other
cases, but I suspect if it _is_ happening, it's for this reason). The retry
happens _after_ `perform`, so it cannot fail for due to recursiveness. If it
fails at this point, we @panic. This is harsh, but it isn't easily recoverable
and before putting effort into it, I'd like to know that it's actually happening.
Fix potential use of undefined when a 401-407 request is received, but no
'WWW-Authenticate' or 'Proxy-Authenticate' header is received.
Don't call `curl_multi_remove_handle` on an easy that hasn't been added yet do
to error. Specifically, if `makeRequest` fails during setup, transfer_conn is
nulled so that `transfer.deinit()` doesn't try to remove the connection. And the
conn is removed from the `in_use` queue and made `available` again.
On Abort, if getting the private fails (extremely unlikely), we now still try
to remove the connection from the multi.
Added a few more fields to the famous "ScriptManager.Header recall" assertion.
This PR introduces a custom CDP domain 'LP' (Lightpanda) to expose browser-specific tools. The first method, 'LP.getMarkdown', allows retrieving a Markdown representation of the DOM or a specific node by its 'nodeId'. This is optimized for AI agents and LLM-based scraping tasks.
This allows us to leverage the Caller.Function.call method, which does type
mapping, caching, etc... and allows the Zig function callback to be written like
any other Zig WebAPI function.
ReadableStream.zig was the only webapi file importing v8 directly.
Extract the repeated newFunctionWithData / callback boilerplate into
js/Local (newFunctionWithData) and js/Caller (initFromHandle,
FunctionCallbackInfo.getData), and update ReadableStream and Context
to use them.
Returns 1 when writable (default high water mark), 0 when closed,
and null when errored, matching the spec behavior for streams
without a custom queuing strategy.
After BOM stripping or when receiving an empty Uint8Array, the
decoded input can be zero-length. Per spec, empty chunks should
produce no output rather than enqueuing an empty string.
When JS called controller.enqueue(42), the value was coerced to the
string "42" because Chunk only had uint8array and string variants.
Add a js_value variant that persists the raw JS value handle, and
expose enqueueValue(js.Value) as the JS-facing enqueue method so
numbers, booleans, and objects round-trip with their original types.
- Use StaticStringMap and enums for method, tool, and resource lookups.
- Implement comptime JSON minification for tool schemas.
- Refactor router and harness to use more efficient buffered polling.
- Consolidate integration tests and add synchronous unit tests.
Replace synchronous queue-draining approach with async promise-based
piping using V8's thenAndCatch callbacks. PipeState struct manages the
async read loop: reader.read() returns a Promise, onReadFulfilled
extracts {done, value}, writes chunks to the writable side, and
recurses via pumpRead() until the stream closes.
Implement synchronous piping that drains queued chunks from a
ReadableStream into a WritableStream. pipeThrough accepts any
{readable, writable} pair (TransformStream, TextDecoderStream, etc.)
and returns the readable side. pipeTo writes all chunks to a
WritableStream and resolves when complete.
Mirrors TextEncoderStream: wraps a TransformStream with a Zig-level
transform that converts Uint8Array chunks to strings. Supports the
same constructor options as TextDecoder (label, fatal, ignoreBOM).
Add the missing Streams API types needed for TextEncoderStream support:
- WritableStream with locked/getWriter, supporting both JS sink callbacks
and internal TransformStream routing
- WritableStreamDefaultWriter with write/close/releaseLock/closed/ready
- WritableStreamDefaultController with error()
- TransformStream with readable/writable accessors, JS transformer
callbacks (start/transform/flush), and Zig-level transform support
- TransformStreamDefaultController with enqueue/error/terminate
- TextEncoderStream that encodes string chunks to UTF-8 Uint8Array
via a Zig-level transform function
In the previous commits, two separte crash resolution conspired to introduce
1 tick delay in request handling.
When we're in a libcurl perform, we can't re-enter libcurl. So we added a
check before processing new requests to make sure we weren't "performing" and,
if we were, we'd queue the request (hence the 1 tick delay).
But for another issue, we set the same "performing" check when manually
triggering callbacks. This extended the situations where the above check fired
thus causing the 1-tick delay to happen under more (and even common) situation.
This commit improves this - instead of relying on the global "performing" check
when processing 1 transfer explicitly, we now have a per-transfer performing
check. This prevents the transfer from being deinitialized during a callback
but does not block requests from being started immediately.
This field was recently added and is used to generate correct frameIds in CDP
messages. They remain the same during a navigation event, so calling them
page.id might cause surprises since navigation events create new pages, but
retain the original id. Hence, frame_id is more accurate and hopefully less
surprising.
(This is a small cleanup prior to doing some iframe navigation work).
- Refactor router and test harness for non-blocking I/O using buffered polling.
- Implement reliable test failure reporting from sub-threads to the main test runner.
- Encapsulate pipe management using idiomatic std.fs.File methods.
- Fix invalid JSON generation in resource streaming due to duplicate fields.
- Improve shutdown sequence for clean test exits.
There are two main ways to dispatch events, both via the EventManager: dispatch
and dispatchWithFunction. dispatchWithFunction came about from having to
dispatch to function callbacks in addition to event listeners. Specifically,
firing the window.onload callback.
Since that original design, much has changed. Most significantly, with
https://github.com/lightpanda-io/browser/pull/1524 callbacks defined via
attributes became properly (I hope) integrated with the event dispatching.
Furthermore, the number of non-tree event targets (e.g. AbortSignal) has grown
significantly. Finally, dispatching an event is DOM-based event is pretty
complex, involving multiple phases and capturing the path.
The current design is largely correct, but non-obvious. This commit attempts to
improve the ergonomics of event dispatching.
`dispatchWithFunction` has been renamed to `dispatchDirect`. This function is
meant to be used with non-DOM event targets. It is optimized for having an event
path with a single target andh no bubbling/capture phase. In addition to being
a little more streamlined, `dispatchDirect` will internally turn a
`js.Function.Global` or `js.Function.Temp` into a local. This makes the callsite
simpler, but also provides optimization opportunity - not having to create
a new scope for the common case of having no callback/listener. This lays the
groundwork for having a `hasDirect` guard clause at the callsite to avoid
unnecessary event creation (todo in a follow up commit).
`dispatch` remains unchanged. While `dispatch` is primarily meant to handle the
DOM-based EventTarget, it will forward non-DOM EventTargets to `dispatchDirect`.
This is necessary since JS code can call `signal.dispatchEvent(....)`.
Two notes:
1 - The flow of dispatchDirect is an optimization. The spec makes no distinction
between DOM and non-DOM based EventTargets.
2 - While the window (as an EventTarget) should probably be thought of as a
DOM-based EventTarget, we use `dispatchDirect with it. This is because it
sits at the root and thus can safely go through the faster `dispatchDirect`.
Fixes a separate but similar issue to
https://github.com/lightpanda-io/browser/pull/1689
Specifically, it prevents starting a request from within a libcurl handler, thus
avoiding an illegal recursive call.
(This commit also removes the failed function call debug logging for
DOMExceptions, as these aren't particularly abnormal / log-worthy)
This was already handled in most cases, but not for a body-less response. It's
safe to call transfer.abort() during a callback, so long as the performing flag
is set to true. This was set during the normal libcurl callbacks, but for a
body-less response, we manually invoke the header_done_callback and were not
setting the performing flag.
https://github.com/lightpanda-io/browser/pull/1614 improved our shutdown
behavior so that microtasks associated with a context wouldn't fire after the
context was disposed of. This involved having context-specific microtasks,
pumping the message loop, and prevent re-entry.
The shutdown code in CDP already had much of this behavior built-in, but it has
now become redundant. Most importantly the CDP shutdown logic did not prevent
re-entry.
Removing this code fixes a flaky WPT crash. I didn't seem to be tied to a
specific test, but rather a cross-context/page use-after-free that was saw
prior to 1614. I could reproduce it reliably by running `/wasm/core/`.
I'll be honest, it isn't clear to me why _removing_ the CDP cleanup helps.
Running the message loop and microtask _before_ our normal shutdown might be
unnecessary, but why would it crash? I don't know, but the CDP path is slightly
different in that it also involves Inspector shutdown. So there's still
something about this flow I don't quite understand. And, at least for this case
the current flow seems "correct".
- Use an allocating writer in `sendResponse` to handle large payloads.
- Update the main loop to tick the HTTP client and cap poll timeouts.
- Update protocol version and minify tool input schemas.
Previously the "load" event happened when all external scripts were done. In the
case that there was no external script, the "load" event would fire immediately
after parsing.
With iframes, it now waits for external script AND iframes to complete but the
no-external-script code was never updated to consider iframes and would thus
fire load events prematurely.
Adds a not-documented "wpt" mode to --dump which outputs a formatted
report.cases.
This is meant to make working on a single WPT test case easier, particularly
with some coding tool. Claude recommended this output for its own use.
Instead of telling claude to start the browser in serve mode, then run the
wptrunner, and merge the two outputs (and then stop the server), you can do:
zig build run -- fetch --dump wpt "http://localhost:8000/dom/nodes/CharacterData-appendChild.html"
(you still need the wpt server up)
I'm not sure what the correct behavior is, but this fixes a WPT crash:
/html/browsers/sandboxing/sandbox-inherited-from-required-csp.html
The issue is iframe-specific as, with an iframe, you document.write can be
called during parsing when there's no document._current_script (because it's
being executed from the parent).
Follow up to https://github.com/lightpanda-io/browser/pull/1646 applies the
same change to XHR URLs.
Following specs, ignores unknown/invalid parameters of the Content-Type when
parsing the MIME (rather than rejecting the entire header).
I think this code comes from some serialization tweak from when everything was
an std.Uri and by switch to [:0]const u8 everywhere not only was the tweak
unecessary, it was also wrong - possibly resulting in the generation of
invalid JSON.
Was looking at, what I thought was a related issue, and started to extract this
code to re-use it (in DataURIs). Realized it could be written without the
intermediate allocation. Then I realized the dataURI issue is something else,
but wanted to keep this improvement.
After looking at a handful of websites, the # of Text and Commend nodes
that are small (<= 12 bytes) is _really_ high. Ranging from 85% to 98%. I
thought that was high, but a lot of it is indentation or a sentence that's
broken down into multiple nodes, eg:
<div><b>sale!</b> <span class=price>$1.99</span> buy now<div>
So what looks like 1 sentence to us, is actually 3 text nodes.
On a typical website, we should see thousands of fewer allocations in the
page arena for the text in text nodes.
Most importantly, this allows the Selector.List to be self-contained with
an arena from the ArenaPool. Selector.List can be both relatively large
and relatively common, so moving it off the page.arena is a nice win.
Also applied this to ChildNodes, which is much smaller but could also be
called often.
I was initially going to hook into the v8::Object's internal fields to store
the referencing v8::Object. So the v8::Object representing the Iterator
would store the v8::Object representing the NodeList inside of its internal
field - which the GC would trace/detect/respect. And that is probably the
fastest and most v8-ish solution, but I couldn't come up with an elegant
solution. The best I had was having a "afterCreate" callback which passed
the v8 object (this is similar to the old postAttach callback we had, but
used for a different purpose). However, since "acquireRef" was recently
added to events, re-using that was much simpler and worked well.
Shutdown on MacOS doesn't work properly. The process appears to shutdown, but
will continue to run in the background. It can run infinitely if it's stuck in
a JS loop. To help it along, Client.stop now force-terminates the isolate.
I also don't think shutdown is working as intended on Linux, but the problem
seems less serious there. On Linux, it appears to properly kill the process
(which is the important thing), but I don't think it necessarily does a clean
shutdown.
Since we already rely on github for builds, this removes a point of failure.
Also curl.se consistently fails from a VPS machine for me - not sure if
they're blocking IP ranges, but it works fine on github.
This bug continues elusive. The latest crash logs show us that, somehow, 1
script is being resolved from 2 transfer objects. This doesn't seem possible,
so I'm adding more properties to log the state of both transfers to try and
figure out how this is happening.
Follow up to https://github.com/lightpanda-io/browser/pull/1646
The encodeURL (renamed to ensureEncoded and exposed in this commit) already
handled already-encoded URLs, so this was largely a matter of exposing the
functionality.
The reason this isn't baked directly into Page.navigate is that, in some places
e.g. internal navigation, the URL is already know to be encoded. So it's up
to every caller to make sure they are passing a valid URL to navigate.
This is normally not called in "normal" dump-usage, but with
XMLSerializer.serializeToString an Attr node _can_ be provided. The spec says,
and FF agrees, this should return an empty string.
When set (defaults to not set/false), --dump will include iframe contents.
I was hoping I could add a mode to strip_mode to this, but since dump is used
extensively (e.g. innerHTML), this is something that has to be off by default
(for correctness).
Depends on: https://github.com/lightpanda-io/zig-v8-fork/pull/152
We previously ran the message loop every 250ms. This commit changes it to run on
every tick (much more frequently). It also runs microtasks after draining the
message loop (since it can generate microtasks).
Also, we use to run microtasks after each script execution. Now we drain the
message Loop + microtasks.
We still only drain the microtasks when executing v8 callbacks.
As part of this change, we also adjust our wait time based on whether or not
there are pending background tasks in v8 in order to try to execute them (in
general) and in a timely manner.
The goal is to ensure that tasks v8 enqueued on the foreground thread are
executed promptly.
This change is particularly useful for calls to webassembly as compilation
happens in the background and eventually requires the message loop to be drained
to continue.
Previously, if a script did `await WebAssembly.instantiate(....)`, there was
a good chance we'd never finish the code - we'd wait too long to run the
message loop AND, after running it, we wouldn't necessarily resolve the promise.
Some CDP clients (e.g. playwright-go) request /json/version/ with a
trailing
slash. Added handling for this variant to match the exact same behavior
as /json/version
Given:
a.href = "over 9000!"
Then:
a.href === BASE_URL + '/over%209000!';
This commits adds an escape: bool option to URL.resolve which will escape the
path, query and fragment when true.
Also changes the Anchor, Image, Link and IFrame getSrc to escape. Escaping is
also used when navigating a frame.
We currently store all of the env's contexts in an ArrayList. When performing
micro/macro tasks, we iterate through this list and perform the micro/macro
tasks. This can result in the ArrayList being invalidated (e.g. a microtask can
result in a context being created, a promise resolving and creating an iframe).
Invalidating the arrylist while we iterate through it is a use-after-free.
This commit stores contexts in a fixed array (64) so that it doesn't move.
Iteration is slower, unfortunately, as the new `env.context_count` has to be
checked.
Fixes WPT crash on
/html/cross-origin-embedder-policy/cross-origin-isolated-permission-iframe.https.window.html url=http://web-platform.test:8000/html/cross-origin-embedder-policy/cross-origin-isolated-permission-iframe.https.window.html
This commit also prevents microtasks execution from causing microtask execution.
On the above test, I saw runMicrotasks which I don't think we're supposed to do.
Wasn't really needed before frames and multithreading,but it's pretty essential
now to figure out what's going on. Probably needs to be applied more
broadly.
This seems to only happen in error cases, most notably someone changes the
object to return an invalid ownKeys, as we see in WPT
/fetch/api/headers/headers-record.any.html
This is a bit sad, but at least for now, all frames must share the same
page.arena and page.factory (they still get their own v8::Context and
call_arena).
Consider this case (from WPT /css/cssom-view/scrollingElement.html):
```
let nonQuirksFrame = document.createElement("iframe");
nonQuirksFrame.onload = this.step_func_done(function() {
var nonQuirksDoc = nonQuirksFrame.contentDocument;
nonQuirksDoc.removeChild(nonQuirksDoc.documentElement);
});
nonQuirksFrame.src = URL.createObjectURL(new Blob([`<!doctype html>`], { type:
"text/html" }));
document.body.append(nonQuirksFrame);
```
We have the root page (p0) and the frame page (p1). When the frame (p1) is
created, it's [currently] given its own arena/factory and parses the page. Those
nodes are created with p1's factory.
The onload callback executes on p0, so when we call removeChild that's executing
with p0's arena/factory and tries to release memory using that factory - which
is NOT the factory that created them.
A better approach might be that _memory_ operations aren't tied to the current
calling context, but rather the owning document's page. But:
1 - That would mean we have 2 execution contexts: the v8 context where the code
is running, and the memory context that owns the code
2 - Nodes can be disconnected, what page should we use?
3 - Some operations can behave across frames, p0 could adoptNode on p1, so we'd
have to carefully use p1's factory when cleaning up and re-create the node
in p0's factory.
So much hassle.
Using a shared factory/arena solves these problems at the cost of bloat - when
a frame goes away or navigates, we can't free its old memory. At some point, we
should fix that. But I don't have a quick and easy solution, and sharing the
arena/factory is _really_ quick and easy.
Previously, we used a boolean, `_v8_handoff` to detect whether or not an event
was handed off to v8. When it _was_ handed off, then we relied on the Global
finalizer (or context shutdown) to cleanup the instance. When it wasn't handed
off, we could immediately free the instance.
The issue is that, under pressure, v8 might finalize the event _before_ we've
checked the handoff flag. This was the old code:
```zig
const event = try Event.initTrusted(.wrap("DOMContentLoaded"), .{ .bubbles = true }, self);
defer if (!event._v8_handoff) event.deinit(false);
try self._event_manager.dispatch(
self.document.asEventTarget(),
event,
);
```
But what happens if, during the call to dispatch, v8 finalizes the event? The
defer statement will access event after its been freed.
Rather than a boolean, we now track a basic reference count. deinit decreases
the reference count, and only frees the object when it reaches 0. Any handoff
to v8 automatically increases the reference count by 1. The above code becomes
a simpler:
```zig
const event = try Event.initTrusted(.wrap("DOMContentLoaded"), .{ .bubbles = true }, self);
defer event.deinit(false);
try self._event_manager.dispatch(
self.document.asEventTarget(),
event,
);
```
The deinit is un-conditional. The dispatch function itself increases the RC by 1,
and then the v8 handoff increases it to 2. On v8 finalization the RC is
decreased to 1. The defer deinit decreases it to 0, at which point it is freed.
Fixes WPT /css/css-transitions/properties-value-003.html
Should be updated and merged after:
https://github.com/lightpanda-io/browser/pull/1623 else we'll have a double-free.
The ScriptManager used to directly call the "onload" and "onerror" attributes.
The implementation predates EventManager.dispatch support attribute-based
callbacks. But now the EventManager is attribute-aware and correctly times
the attribute dispatch AND details such as cancellation. So this commit moves
the old attribute-only ScriptManager-specific callback to the EventManager.
With one little wrinkle: 'load' listeners added during a script's execution
should NOT receive a 'load' event when the script finishes. This makes no
sense to me. The EventManager now maintains an ignore_list for "load" events
which is reset after each script execution. A comptime flag is passed to
dispatch to indicate whether the ignore list should be checked. This is only
ever set when the ScriptManager dispatches the 'load' event, so there's no
overhead to dispatch for most events.
An amazon product page has 345 resources and not a single DOM "load" listener.
This, I believe, is pretty common (reddit also has no "load" listener). So this
is a simple optimization to skip dispatching the resource "load" event when
there's no listener for it.
The check could be more granular, i.e. checking the specific parents of the
element. But I believe the global no "load" listener is common enough that this
simpler approach is the best.
The worst case is that we dispatch unnecessary "load" events, which is exactly
what the code was doing before.
We often verify the correctness of tests by loading them in an external browser,
but some tests just don't run the same/correctly. For example, we used to hard-
code the http://127.0.0.1:9582/ origin, but that would cause tests to fail if
running from a different origin.
This commit _begins_ the work of improving this. It introduces a
testing.ORIGIN, testing.BASE_URL and testing.HOST which will work correctly in
both our runner and an external browser.
It also introduces `testing.IS_TEST_RUNNER` boolean flag so that tests which
have no chance of working in an external browser (e.g. screen.width) can be
skipped.
The goal is to reduce/remove tests which fail in external browsers so that such
failures aren't quickly written off as "just how it is".
As per spec:
For legacy reasons, load events for resources inside the document (e.g., images)
do not include the Window in the propagation path in HTML implementations.
Define properties directly on the PrototypeTemplate, removing the need for 1
FunctionTemplate per instance-property. Also, properties were previously all
readonly. There is now a readonly flag...thus, properties which we don't care
about, but have a default value, can be defined this way.
Added a 'noop' flag to functions which skips the zig callback and does nothing.
I was initially hoping these would noticeably reduce the size of our snapshot
due to requiring fewer FunctionTemplates. While it shaves a few ~3KB, it's far
less than I was hoping. The issue is that noop function templates still need
to be unique so that `type_a.noopFunc !== type_b.noopFunc` remains true.
Still, as we add more dummy implementation, these little tweaks might add up.
Adds a new `mcp` run mode to start an MCP server over stdio.
Implements tools for navigation and JS evaluation, along with
resources for HTML and Markdown page content.
1 - Finalizer callbacks are now give a *Page parameter. Various types no longer
need to maintain a reference to *Page just to finalize
2 - EventManager now handles v8_handoff == false cleanup. This is largely
because of the above change, which would require every:
```
defer if (!event._v8_handoff) event.deinit(false);
```
to be turned into:
```
defer if (!event._v8_handoff) event.deinit(false, page);
```
But the caller might not have a page. Besides this, it makes most uses of Event
simpler. But, in some cases, it could leave a window where the event doesn't
reach the EventManager to be properly managed (though, we have no such cases
as of now).
getImplementation() now returns a cached *DOMImplementation pointer
per Document, matching the getStyleSheets() pattern. This ensures
document.implementation === document.implementation holds true.
Flips dom/nodes/Document-implementation.html (1/2 → 2/2).
- uppercase entire qualified name in tagName (including prefix)
- validate PI data for "?>" and use proper XML Name production with Unicode
- implement willValidate on HTMLInputElement
- throw IndexSizeError DOMException for negative maxLength assignment
flips: Node-nodeName, Document-createProcessingInstruction, button,
maxlength, input-willvalidate (+6 subtests)
Under some conditions, a microtask would be executed for a context that was
already deinit'd, resulting in various use-after-free.
The culprit appears to be WASM compilation being placed in the microtask queue
(by a user-script) and then resolved at some point in the future. We guard the
microtask queue by a context.shutting_down boolean, but v8 doesn't know anything
about this flag. The fact is that, microtasks are tied to an isolate, not a
context.
This commit introduces a number of changes:
1 - It follows 309f254c2c and stores the zig Context inside of an embedder field. This
ensures v8 doesn't consider this when GC'ing, which _could_ extend the
lifetime of the v8::Context beyond what we expect
2 - Most significantly, it introduces per-context microtasks queues. Each
context gets its own queue. This makes cleanup much simpler and reduces the
chance of microtasks outliving the context
3 - pumpMessageLoop is called on context.deinit, this helps to ensure that any
tasks v8 has for our context are processed (e.g. wasm compilation) before
shtudown
4 - The order of context shutdown is important, we notify the isolate of the
context destruction first, then pump the message loop and finally destroy
the context's message loop.
Depends on https://github.com/lightpanda-io/zig-v8-fork/pull/151
- add ChildNode.remove() to DocumentType (flips DocumentType-remove.html)
- return null for MutationRecord.attributeNamespace on non-namespaced
attribute mutations (flips MutationObserver-takeRecords.html)
- stop lowercasing in createElementNS per spec — only createElement
should ASCII-lowercase for HTML namespace (flips
Element/Document-getElementsByTagNameNS.html)
- fix getElementsByTagName to use case-insensitive matching for HTML
namespace elements
Implements the DOM spec algorithms for namespace lookup on all node
types. Stores custom namespace URIs in a page lookup for elements
created via createElementNS with unknown namespaces. Fixes
setAttributeNS to preserve qualified names for xmlns namespace
declarations.
Flips dom/nodes/Node-lookupNamespaceURI.html: 0/75 → 75/75.
after() captured node.nextSibling() once, which went stale when that
sibling was one of the nodes being inserted. Use viableNextSibling() to
find the first following sibling not in the nodes list per the DOM spec.
replaceWith() in CData had the same stale-reference problem and also
removed self before inserting, unlike Element.replaceWith() which keeps
self as the insertion anchor. Adopt the same anchor pattern: insert
before self, then remove self at the end.
Flips ChildNode-after.html from 33/45 to 45/45 and
ChildNode-replaceWith.html from 27/33 to 33/33.
Add all 25 legacy constants (INDEX_SIZE_ERR through DATA_CLONE_ERR)
to DOMException on both constructor and prototype, enabling WPT
assert_throws_dom checks that reference e.code.
1 - Fix an issue where build would persist a value in the call_arena
2 - Remove double allocation (call_arena -> page_arena)
3 - Improve ergonomics of sanitizeValue with a comptime value indicating whether
or not to always dupe the value.
When a WebAPI takes `[]const u8`, we coerce values to strings. But when it
takes a `?[]const u8` how should we handle `null`? Some APIs might want to know
that it was null, others might just want `"null``.
Currently when `null` is passed to `?[]const u8`, we'll get null.
This adds a discriminator type, js.NullableString. When `null` is passed to it
it'll be converted to `"null"`.
This no longer works with frames. Multiple frames could have a scheduled
navigation, so a single arena no longer has a clear lifecycle. Instead an arena
from the pool is used per navigation event, thus the queued_navigation is self-
contained.
This required having libcurl copy the body. Unfortunate. Currently we free the
arena as soon as the navigation begins. This is clean. But it means the body is
immediately freed (thus we need libcurl to copy it). As an alternative, each
page could maintain an optional transfer_arena, which it could free on
httpDone/Error.
page.wait is the only significant difference between the "root" page and a page
for an iframe. I think it's more explicit to move this out of the page and
into the session, which was already the sole entry-point for page.wait.
Missing:
- [ ] Navigation support within frames (in fact, as-is, any navigation done
inside a frame, will almost certainly break things
- [ ] Correct CDP support. I don't know how frames are supposed to be exposed
to CDP. Normal navigate events? Distinct CDP frame_ids?
- [ ] Cross-origin restrictions. The interaction between frames is supposed to
change depending on whether or not they're on the same origin
- [ ] Potentially handling src-less frames incorrectly. Might not really matter
Adds basic frame support. Initially explored adding a BrowsingContext and
embedding it in Page, with the goal of also having it embedded in a to-be
created Frame. But it turns out that 98% of Page _was_ BrowsingContext and
introducing a BrowsingContext as the primary interaction unit broke pretty much
_every_ single WebAPI. So Page was expanded:
- Added `_parent: ?*Page`, which is `null` for "root" page.
- Added `frame: ?*IFrame`, which is `null` for the "root" page. This is the
HTMLIFrameElement for frame-pages.
- Added a _type: enum{root, frame}, which is currently only used to improve
the logs
- Added a frames: std.ArrayList(*Page). This is a list of frames for the page.
Note that a "frame-page" can itself haven nested frames.
Besides the above, there were 3 "big" changes.
1 - Adding frames (dynamically, parsed) has to create a new page, start
navigation, track it (in the frames list). Part of this was just
piggybacking off of code that handles <script>
2 - The page "load" event blocks on the frame "load" event. This cascades.
when a page triggers it's load, it can do:
```zig
if (self._parent) |p| {
p.iframeLoaded(self);
}
```
Pages need to keep track of how many iframes they're waiting to load. When
all iframes (and all scripts) are loaded, it can then triggers its own load
event.
3 - Our JS execution expects 1 primary entered context (the pages). But we now
have multiple page contexts, and we need to be in the correct one based
on where javascript is being executed. There is no more an default entered
context. Creating a Local.Scope enters the context, and ls.deinit() exits
the context.
Adds logic to detect if an anchor contains block descendants or is a
standalone element within a layout block. These are now rendered with
appropriate spacing and link formatting. Also adds `.main` to the list
of block elements.
To reflect the current style attribute, CSSStyleDeclaration now parses
it on init.
Moreover, this PR synchronizes the element's style attribute with the
dynamic changes.
- number: validate against spec grammar (reject "+1", "1.", "Infinity",
"NaN", overflow values like "2e308")
- date: validate YYYY-MM-DD with day-of-month and leap year checks
- time: validate HH:MM[:SS[.sss]]
- month: validate YYYY-MM
- week: validate YYYY-Www with ISO 8601 week count
- datetime-local: validate and normalize (T separator, shortest time)
- color: normalize hex to lowercase per spec
- checkbox/radio: return "on" as default value
- sanitize initial value from HTML attributes in created()
- checkbox/radio getValue() returns "on" when no value attribute set
- color input sanitization normalizes hex to lowercase per spec
- initial input value is sanitized per input type during element creation
- initEvent resets both stop_propagation and stop_immediate_propagation
- dispatchNode resets propagation flags after dispatch per DOM spec step 12
Our BrowsingContext currently supports 1 target. So we have a per-BC target_id.
Previously, our target had 1 "frame" - our page. So we often treated the
targetId as the frameId. But to work with frames, we need page-specific
frameIds and loaderIds.
This tries to clean up our ids (a little). frameIds are now ids derived from
a new incrementing page.id. This page.id has to be passed around (via http
Requests and through notifications) in order to properly generate messages with
a frameId.
Added a get-only `getStyle` which doesn't lazily create a new style if none
exists. This can be used in the (frequently used) `checkVisibility` to avoid
an allocation. Added a specialized getBoundingClientRectForVisible which
skips the checkVisibility check, since a few callers have already done their
own visibility check.
DOMRect is now off the heap. This avoids _a lot_ of allocation when a DOMRect
is only needed for internal calculation, e.g. in Document.elementFromPoint.
implement proper tree-walking in writeTextContent to handle all cases:
same-element containers, cross-container ranges, and comment exclusion.
uncomment ~800 lines of Range tests and add 5 new toString tests.
handleClick() was setting _active_element directly, bypassing
Element.focus() — so no blur/focus events fired on click.
Now calls focus() for input, button, select, textarea, anchor.
focus() and blur() now dispatch all four spec-required FocusEvents:
blur (no bubble) → focusout (bubbles) → focus (no bubble) → focusin (bubbles)
Each event carries the correct relatedTarget: the element gaining focus
for blur/focusout, and the element losing focus for focus/focusin.
All four events are composed per W3C spec.
Relates to #1161
Both insertText() and handleKeydown() were manually concatenating text
to the textarea value, ignoring the current selection range. This uses
the existing innerInsert() method which correctly handles full, partial,
and no-selection states — matching the Input element behavior.
interactive elements (input, button, a, select, textarea, iframe)
default to 0 when tabindex attribute is absent; others default to -1.
also add TODO for hidden "until-found" tristate.
Tested in Chrome (headless, --autoplay-policy=no-user-gesture-required):
playing fires on every pause-to-play transition, not just the first
time. The _playing flag was incorrectly suppressing this. Removed it
and updated tests to match verified Chrome behavior.
Per spec, valid type values are absent, empty string, or a
case-insensitive match for "text/css". Also clear cached sheet
when the element is disconnected or type becomes invalid.
Per spec, complete returns true when the image has no src, is fully
fetched, or is broken. Since this is a headless browser that does
not fetch images, complete always returns true.
Per spec, connected style elements with type text/css should
return a CSSStyleSheet object. Previously always returned null.
The sheet is lazily created and cached on first access.
html5ever generally makes guarantees about nodes being parentless when
appending, but we've already seen 1 case where appendCallback receives a
connected node.
We're now seeing something in appendBeforeSiblingCallback, but we have a clearer
picture of how this is happening. In this case, it's via custom element
upgrading and the custom element constructor has already placed the node in
the document.
It's worth pointing, html5ever just has an opaque reference to our node. While
it guarantees that it will give us parent-less nodes, it doesn't actually know
anything about our nodes, or our node._parent. The guarantee is only from its
own point of view. There's nothing stopping us from giving a node a default
parent as soon as html5ever asks us to create a new node, in which case, the
node _will_ have a parent.
Per spec, buffered entries should be delivered via a queued task,
not synchronously. Extract scheduling logic into
schedulePerformanceObserverDelivery() and use it from both
notifyPerformanceObservers and the buffered observe path.
Per MDN, playing fires "after playback is first started, and whenever
it is restarted." A second play() while already playing should be a
no-op. Both play and playing now only fire on the paused -> playing
transition.
- play event only fires when transitioning from paused to playing
- pause event only fires when transitioning from playing to paused
- playing event always fires on play() per spec
- explicitly set bubbles: false, cancelable: false on events
- updated tests to verify no duplicate events on repeated calls
Fetch responses from the network now have their statusText set using
the HTTP status phrase (e.g. "OK", "Not Found"), matching the Fetch
spec and the existing XMLHttpRequest behavior.
Compute and include the cookie size field (name.len + value.len)
in Storage.getCookies and Network.getCookies CDP responses,
matching Chrome's behavior.
Add the `buttons` read-only property to MouseEvent as specified by
the W3C UI Events spec (unsigned short bitmask of currently pressed
buttons). Propagate the field through PointerEvent and WheelEvent
constructors which inherit from MouseEvent.
Looks like curl will accept these as valid headers, and won't normalize the
header, so we have to deal with either a 2-byte or 1-byte terminated header
When a CDP client navigates to a page and the page generates an error,
it blocks waiting for the .page_navigated event.
It currently happens w/ robots.txt denied page.
Example: https://httpbin.io/deny
This expands the caching capabilities which were first added in
https://github.com/lightpanda-io/browser/pull/1552
Internal field caching requires up-front memory, but is faster. It is currently
enabled for window.document and window.console - two very frequently accessed
values.
Implementations must correctly provide an internal field index, with
consideration for index 0 which may or may not be reserved for the type (it
depends on the type). comptime checks run to make sure this is correct, but it
would probably be nice to at least let them be declared in any order.
This commit also removes the special handling for loading the window. This used
to rely on the window not having any internal fields, but it now has them for
caching so it can't be detected that way. Instead, the window is loaded like any
other object. (But now we have to special case the initial window TAO creation
to make it behave like any other Zig instance).
Very simply, this PR ensures that:
div.childNodes === div.childNodes
Previously, each invocation of childNodes would return a distinct object. Not
just inefficient, but incorrect.
Where this gets more complicated is the how.
The simple way to do this would be to have an optional `_child_nodes` field in
Node. When it's called the first time, we load and return it and, on subsequent
calls we can return it from the field directly.
But we generally avoid this pattern for data that we don't expect to be called
often relative to the number of instances. A page with 20K nodes _might_ see
.childNodes called on 1% of those, so storing a pointer in Nodes which isn't
going to be used isn't particularly memory efficient.
Instead, we have (historically) opted to store this in a page-level map/lookup.
This is used extensively for various element properties, e.g. the page
_element_class_lists lookup.
I recently abandoned work on v8 property caching
(https://github.com/lightpanda-io/browser/pull/1511). But then I looked into the
performance on a specific website with _a lot_ of DOMRect creation and I started
to think about both caching and pure-v8 DOM objects. So this PR became a
two-birds with one stone kind of deal. It re-introduces caching as a means to
solve the childNodes correctness. This uses 1 specific type of caching mechanism,
hooking into a v8::object's Private data map, but the code should be easily
extendable to support a faster (but less memory efficient, depending on the use
case) option: internal fields.
Needed this to implement `ImageData#data` getter. This works differently than other typed arrays since returned object can be mutated from both Zig and JS ends.
Trying to see how the "ScriptManager.Header buffer" assertion is failing. Either
`headerCallback` is being called multiple times, or the script is corrupt. By
adding a similar assertion in various places, we can hopefully narrow (a) what's
going on and (b) what code is involved.
Also, switched the BufferPool from DoublyLinkedList to SinglyLinkedList. Was
just reviewing this code (to see if the buffer could possibly become corrupt)
and realized this could be switched.
An index out of range request to a nodelist , e.g. childNodes[1000] now properly
returns a error.NotHandled error, which is given to v8 as a non-intercepted
property.
When a ChildNode node list is created from Node.childNode, we store the *Node
rather than its children. ChildNode is meant to be live, so if the node's
children changes, we should capture that.
This is preparatory work for re-introducing property caching and pure v8 WebAPIs
It does 3 things:
1 - It removes the duplication of method calling we had in Accessors and
Functions type briges.
2 - It flattens the method-call chain. It used to be some code in bridge, then
method, then _method. Most of the code is now in the same place. This will
be important since caching requires the raw js_value, which we previously
didn't expose from _method. Now, it's just there.
3 - Caller used to do everything. Then we introduced Local and a lot of Caller
methods didn't need caller itself, they just needed &self.local. While those
methods remain in Caller.zig, they now take a *const Local directly and thus
can be called without Caller, making them usable without a Caller.
We're currently using Get/SetInternalField to store our toa instance in v8. This
appears to be meant for v8 data itself, as it participates in the GC's
referencing counting. This is a bit obvious by the fact that it expects a
v8::Data, we we're able to do by wrapping our toa into a v8::External.
The Get/SetAlignedPointerFromInternalField seem specifically designed for toa,
as it takes a (void *) (thus, not requiring the external wrapper) and, from what
I understand is more efficient (presumably because the GC ignores it).
Depends on: https://github.com/lightpanda-io/zig-v8-fork/pull/149
This removes the browser-specific arenas (session, transfer, page, call) in
favor of the arena pool.
This is a bit of a win-lose commit. It exists as (the last?) step before I can
really start working on frames. Frames will require their own "page" and "call"
arenas, so there isn't just 1 per browser now, but rather N, where N is the
number of frames + 1 page. This change was already done for Contexts when
ExecutionWorld was removed, and the idea is the same: making these units more
self contained so to support cases where we break out of the "1" model we
currently have (1 browser, 1 session, 1 page, 1 context, ...).
But it's a bit of a step backwards because the ArenaPool is dumb and just resets
everything to a single hard-coded (for now) value: 16KB. But in my mind, an
arena that's used for 1 thing (e.g. the page or call arenas) is more likely to
be well-sized for that specific role in the future, even on a different
page/navigate.
I think ultimately, we'll move to an ArenaPool that has different levels, e.g.
acquire() and acquireLarge() which can reset to different sizes, so that a page
arena can use acquireLarge() and retain a larger amount of memory between use.
This wasn't 100% intuitive to me. At the start of the event, the input is
immediately toggled. But at any point during dispatching, the default behavior
can be suppressed. So the state of the input's check during dispatching captures
the "intent" of the click. But it's possible for one listener to see that
input.checked == true even though, by the end of dispatching, input.checked ==
false because some other listener called preventDefault().
To support this, we need to capture the "current" state so that, if we need to
rollback, we can. For radio buttons, this "current" state includes capturing
the currently checked radio (if any).
clientTop, clientLeft, scrollTop, scrollLeft, scrollHeight, scrollWidth,
offsetTop, offsetLeft, offsetWidth, offsetHeight.
These are all dummy implementation that hook, as much as possible, into what
layout information we have.
Explicitly set scroll information is stored on the page.
1 - Make element.classList settable
2 - On replace, validate in expected order
3 - On replace, fire mutation observer even if new == old
4 - On replace, handle duplicate values
Do what we can based on the ns and names that we currently have.
Also also, allow element names with non-ascii characters.
both changes are driven by WPT dom/nodes/case.html
Curl doesn't like recursive calls. For example, you can't call
curl_multi_remove_handle from within a dataCallback.
This specifically means that, as-is, transfer.abort() calls aren't safe to be
called during a libcurl callback. Consider this code:
```
req.open('GET', 'http://127.0.0.1:9582/xhr');
req.onreadystatechange = (e) => {
req.abort();
}
req.send();
```
onreadystatechange is triggered by network events, i.e. it executes in libcurl
callback. Thus, the above code fails to truly "abort" the request with
`curl_multi_remove_handle` error, saying it's a recursive call.
To solve this, transfer.abort() now sets an `aborted = true` flag. Callbacks can
now use this flag to signal to libcurl to stop the transfer.
A test was added which reproduced this issue, but this comes from:
https://github.com/lightpanda-io/browser/issues/1527 which I wasn't able to
reliably reproduce. I did see it happen regularly, just not always. It seems
like this commit fixes that issue.
This is a follow up / fix to https://github.com/lightpanda-io/browser/pull/1487
In that PR we triggered a "load" event for special elements, and as part of that
we triggered both the "onload" attribute via dispatchWithFunction and normal
bubbling with dispatch.
This PR applies this change generically and holistically. For example, if an
"abort" event is raised, the "onabort" attribute will be generated for that
element. Importantly, this gets executed in the correct dispatch order and
respect event cancellation (stopPropagation and stopImmediatePropagation).
Documents created via DOMImplementation should not inherit the page.URL.
Add textContent for DocumentFragment
Fix creteElement for XML namespace when not explicitly specified
Some types, like AbortController, shouldn't be iterable on the window. This
commit (a) adds the ability to control this in the snapshot, and sets the
iterability based on the dom/interface-objects.html wpt test list.
Page.reset exists for 1 use case: multiple calls to the Page.navigate CDP
method. At an extreme, something like this in puppeteer:
```
await page.goto(baseURL + '/campfire-commerce/');
await page.goto(baseURL + '/campfire-commerce/');
```
Rather than handling this generically in Page, we now handle this case
specifically at the CDP layer. If the page isn't in its initial load state,
i.e. page._load_state != .waiting, then we reload the page from the session.
For reloading, my initial inclination was to do session.removePage then
session.createPage(). This behavior still seems potentially correct to me, but
compared to our `reset`, this would trigger extra notifications, namely:
self.notification.dispatch(.page_remove, .{});
and
self.notification.dispatch(.page_created, page);
Bacause of https://github.com/lightpanda-io/browser/pull/1265/ I guess that
could have side effects. So, to keep the behavior as close to the current
"reset", a new `session.replacePage()` has been added which behaves a lot like
removePage + createPage, but without the notifications being sent.
While I generally think this is just cleaner, this was largely driven by some
planning for frame support. The entity for a Frame will share a lot with the
Page (we'll extract that logic), so simplifying the Page, especially around
initialization, helps simplify frame support.
Fires the window.onunhandledrejection. This API is a bit different than
everything else, because it's entered from the Isolate/Env. So there's a bit
more js -> webapi awareness baked into Env now to handle it.
Also touched up existing Events that have .Global data and changed it to .Temp
and cleaned it up in their deinit.
It's possible (in fact normal) for client.abort to be called twice on a schedule
navigation. We immediately abort any pending requests once a secondary
navigation is called (is that right?), and then again when the page shuts down.
The first abort will kill the transfer, so the XHR object has to null this value
so that, on context shutdown, when the finalizer is called, we don't try to
free it again.
Previously, the Session depended on the page to return a .navigate state when
a secondary navigation should take place. There were a lot of places where the
page returned .done rather than .navigate.
To simplify the code, .navigate is no longer a WaitResult. The page now just has
to return .done, and the Session will check if there's a queued sub-navigation.
This will fix cases where sub-navigation is triggered by a script being executed
as the last step of page loading.
Pre-zigdom, we could use the postAttach behavior to store commonly accessed and
never-changed accessors directly on the v8::object. I've tried to re-enable this
feature a couple times, and have failed.
Between the new Snapshot and the always-on unhandled property handler for Window
it never seems to work very well. Plus, Gemini continues to insist that changing
the shape of the Object is more like to hurt than help performance.
There's possibly an optimization here worth revisiting in the future, but for
the time being, this code does nothing except for possibly deceive a reader that
some optimization is going on when it isn't.
Used to use std.ascii.whitespace, but per spec, it's only a subset of that which
are valid separators. Other characters, line line tabulation, are valid class
names.
At a high level, this does for Events what was recently done for XHR, Fetch and
Observers. Events are self-contained in their own arena from the ArenaPool and
are registered with v8 to be finalized.
But events are more complicated than those other types. For one, events have
a prototype chain. (XHR also does, but it's always the top-level object that's
created, whereas it's valid to create a base Event or something that inherits
from Event). But the _real_ complication is that Events, unlike previous types,
can be created from Zig or from V8.
This is something that Fetch had to deal with too, because the Response is only
given to V8 on success. So in Fetch, there's a period of time where Zig is
solely responsible for the Response, until it's passed to v8. But with events
it's a lot more subtle.
There are 3 possibilities:
1 - An Event is created from v8. This is the simplest, and it simply becomes a
a weak reference for us. When v8 is done with it, the finalizer is called.
2 - An Event is created in Zig (e.g. window.load) and dispatched to v8. Again
we can rely on the v8 finalizer.
3 - An event is created in Zig, but not dispatched to v8 (e.g. there are no
listeners), Zig has to release the event.
(It's worth pointing out that one thing that still keeps this relatively
straightforward is that we never hold on to Events past some pretty clear point)
Now, it would seem that #3 is the only issue we have to deal with, and maybe
we can do something like:
```
if (event_manager.hasListener("load", capture)) {
try event_manager.dispatch(event);
} else {
event.deinit();
}
```
In fact, in many cases, we could use this to optimize not even creating the
event:
```
if (event_manager.hasListener("load, capture)) {
const event = try createEvent("load", capture);
try event_manager.dispatch(event);
}
```
And that's an optimization worth considering, but it isn't good enough to
properly manage memory. Do you see the issue? There could be a listener (so we
think v8 owns it), but we might never give the value to v8. Any failure between
hasListener and actually handing the value to v8 would result in a leak.
To solve this, the bridge will now set a _v8_handover flag (if present) once it
has created the finalizer_callback entry. So dispatching code now becomes:
```
const event = try createEvent("load", capture);
defer if (!event._v8_handover) event.deinit(false);
try event_manager.dispatch(event);
```
The v8 finalizer callback was also improved. Previously, we just embedded the
pointer to the zig object. In the v8 callback, we could cast that back to T
and call deinit. But, because of possible timing issues between when (if) v8
calls the finalizer, and our own cleanup, the code would check in the context to
see if the ptr was still valid. Wait, what? We're using the ptr to get the
context to see if the ptr is valid?
We now store a pointer to the FinalizerCallback which contains the context.
So instead of something stupid like:
```
// note, if the identity_map doesn't contain the value, then value is likely
// invalid, and value.page will segfault
value.page.js.identity_map.contains(@intFromPtr(value))
```
We do:
```
if (fc.ctx.finalizer_callbacks.contains(@intFromPtr(fc.value)) {
// fc.value is safe to use
}
```
This adds a --http_max_response_size argument to the serve and fetch command
which is enforced by the HTTP client. This defaults to null, no limit.
As-is, the ScriptManager allocates a buffer based on Content-Length. Without
setting this flag, a server could simply reply with Content-Length: 99999999999
9999999999 to cause an OOM. This new flag is checked both once we have the
header if there's a content-length, and when reading the body.
Also requested in https://github.com/lightpanda-io/browser/issues/415
Follow up to https://github.com/lightpanda-io/browser/pull/1495 which introduced
the concept of a fast getter. This commit expands the new behavior to all
comptime-known scalar getters.
It also leverages the new `v8__FunctionTemplate__New__Config` to
1 - flag fast getters as having no side effect
2 - set the length (arity) on all functions
Expands bridge.property to work as a getter. This previously only worked by
setting a value directly on the TemplatePrototype. This is what you want for
something like Node.TEXT_NODE which is accessible on both Node and an instance
(e.g. document.createElement('div').TEXT_NODE).
Now the property can be configured with .{.template = false}. It essentially
becomes an optimized: bridge.accessor(comptime scalar, null, .{});
There are other accessor that can be converted to this type, but I'll do that
after this is merged to keep this PR manageable.
1 - Remove the double logging of "load" dispatch
2 - On even registration, just log the event target type, not the full node
3 - Most significantly, rather than logging unknown properties (both global and
per object) as they happen, collect them and log the on context ends. This
log includes a count of how often it happened. On some pages, this reduces
the number of unknown property messages by thousands.
This seems pretty complicated to implement...mostly because we'd have to keep
a window around in certain cases and because exactly what's accessible on that
window depends on a few factors.
This could more cleanly be applied to page.createElementNS, but that would add
overhead to the parser. Alternative would be to pass a comptime flag to
page.createElementNS (e.g. like the from_parser we pass elsewhere).
Currently the sighandler is setup regardless of the running mode, but it only
does something in "serve" mode. In fetch mode, since there are no registered
listeners, it intercepts the signal and does nothing. On MacOS at least, this
isn't a great experience as it can leave the process running in the background.
We currently log any unknown property access on the global (window). This has
proven pretty useful when debugging, often letting us know a type we're
completely missing.
However, it isn't just about what types we have, it's also about the properties
those types expose. Inspired by the wasted time spent on
https://github.com/lightpanda-io/browser/pull/1473, this commit adds the same
type of logging for unknown properties on objects. In debug mode only. This is
only done for objects which don't have their own NamedIndexer.
Right now, to do anything with listeners, we need to (a) lookup the listeners
registered with the element and (b) walk the linked list to find listeners of
the correct type.
There's no operations that behave on all of an element's listeners, e.g. there's
no element.removeAllListeners(). So, we can optimize the code, and generally
make it simpler to read, by changing our key to include the type (along with
the element).
This is an optimization on its own (and makes the code simpler), but it also
makes it more palatable to do a pre-listener check to avoid creating events when
no listener even exists; if we decide to implement that.
Both of these become self-contained with their own arena, which can be cleaned
up when (a) we don't need it anymore and (b) v8 doens't need it. The "we don't
need it anymore" is true when there's nothing being observed.
The underlying records also get their own arena and are handed off to v8 to
finalize whenever they fall out of scope.
Not sure if this should be in `EventTarget` or `EventManager`, here goes nothing.
`Image`: dispatch `load` event when `src` set
add load event test
remove `hasListener`
let `Scheduler` dispatch `load` event
Simulates async nature.
update test
free `args` when done
implement `load` event dispatch for `<img>` tags
This dispatches `load` events tied to `EventManager` but not the `onload` for some reason...
`"load"` event must be dispatched even if `onload` not set
Resolves the bug that cause event listeners added through `EventTarget` not executing if `onload` not set.
add `onload` getter/setter for `Image`
prefer `attributeChange` to run side-effects
This should give more consistent results than using `setSrc`.
add inline `<img src="..." />` test
`Image`: prefer `inline_lookup` for `onload`
remove incorrect URL check + prefer 0ms in `Scheduler`
change after rebase
On a blocking request that requires authentication, we now handle the two cases
correctly:
1 - if the request is aborted, we don't continue processing (if we did, that
would result in (a) transfer.deinit being called twice and (b) the callbacks
being called twice
2 - if the request is "continue", we queue the transfer to be re-issued, as
opposed to just processing it as-is. We have to queue it because we're
currently inside a process loop and it [probaby] isn't safe to re-enter it.
By using the queue, we wait until the next call to `tick` to re-issue the
request.
First, for macrotasks, it ensures that the correct context is entered.
More important, for microtasks, it protects against use-after-free. This is
something we already did, to some degree, in page.deinit: we would run
microtasks before erasing the page, in case any microtasks referenced the page.
The new approach moves the guard to the context (since we have multiple
contexts) and protects against a microtasks enqueue a new task during shutdown
(something the original never did).
This seems to solve some potential use-after-free issues. By informing the
Inspector that the context is gone, it seems to effectively ensure that no more
messages are sent from the inspector for things related to the context.
1 - Embed Page into Session, avoids having to allocate/deallocate the page
2 - Fix false-positive arena pool leak detection
3 - On form submit, pre-check if navigation will allowed before building the
contents
4 - Make sure QueuedNavigation structure isn't use after page is removed
The WPT tests navigation-api/navigation-methods/navigate-history-push-same-url.html
crashed with a @memcpy arguments alias error.
It seems to be due to the reuse of the previous page.url string.
Forcing to duplicate it fixes the crash.
The BrowserContext currently uses 3 arenas:
1 - Command-specific, which is like the call_arena, but for the processing of a
single CDP command
2 - Notification-specific, which is similar, but for the processing of a single
internal notification event
3 - Arena, which is just the session arena and lives for the duration of the
BrowseContext/Session
This is pretty coarse and can results in significant memory accumulation if a
browser context is re-used for multiple navigations.
This commit introduces 3 changes:
1 - Rather than referencing Session.arena, the BrowerContext.arena is now its
own arena. This doesn't really change anything, but it does help keep things
a bit better separated.
2 - Introduces a page_arena (not to be confused with Page.arena). This arena
exists for the duration of a 1 page, i.e. it's cleared when the
BrowserContext receives the page_created internal notification. The
`captured_responses` now uses this arena, which means captures only exist
for the duration of the current page. This appears to be consistent with
how chrome behaves (In fact, Chrome seems even more aggressive and doesn't
appear to make any guarantees around captured responses). CDP refers to this
lifetime as a "renderer" and has an experimental message, which we don't
support, `Network.configureDurableMessages` to control this.
3 - Isolated Worlds are now more self contained with an arena from the ArenaPool.
There are currently 2 places where the BrowserContext.arena is still used:
1 - the isolated_world list
2 - the custom headers
Although this could be long lived, I believe the above is ok. We should just
really think twice whenever we want to use it for anything else.
page.js currently always references the page context. But through the inspector
JavaScript can be executed in different contexts. When we go from V8->Zig we
correctly capture the current context within the caller's Local. And, because of
this, mapping or anything else that happens against local.ctx, happens in the
right context.
EXCEPT...our code still accesses page.js. So you can have a v8->zig call
happening in Context-2, and our Zig call then tries to do something on Context-1
via page.js.
I'm introducing a change that updates page.js based on the current Caller and
restores it at the end of the Caller. This change is super small, but
potentially has major impact. It's hard to imagine that we haven't run into
problems with this before, and it's hard to imagine what problems this change
might introduce. Certainly, if anyone copies page.js, they'll be in for a rude
surprise, but i don't think we do that anywhere.
==Fix 1==
The problem flow:
1 - The module is dynamically imported, this creates a cache entry with no
module and no module_promise, and starts an async fetch.
2 - Before dynamicModuleSourceCallback fires (from step 1 above), the same
module is imported as a child of a call graph, i.e. via resolveModuleCallback.
Here the module is compiled, but never evaluated (we only evaluate the root
module). This is where things start to go sour. Our cache entry now has a
module, but no module_promise.
3 - The async fetch completes and calls dynamicModuleSourceCallback which call
Context.module. This returns early (the module is already cached thanks to
step 2). But it then calls resolveDynamicModule which (a) has a module and (b)
no module_promise.
Our fix works because, if Context.module finds the cached module (from step 2),
it now also checks for the module_promise. If it doesn't find it, it evaluates
the module (which sets it).
I've since expanded the code to handle more intermediary states.
The original PR had:
if (gop.value_ptr.module_promise == null) {
const mod = local.toLocal(cache_mod);
if (mod.getStatus() == .kInstantiated) {
return self.evaluateModule(want_result, mod, url, true);
}
}
But now the code is:
if (gop.value_ptr.module_promise == null) {
const mod = local.toLocal(cache_mod);
if (mod.getStatus() == .kUninstantiated and try mod.instantiate(resolveModuleCallback) == false) {
return error.ModuleInstantiationError;
}
return self.evaluateModule(want_result, mod, url, true);
}
It seems that v8 handles double-instantiation and double-evaluations safely.
Handle more partial-load states.
Handle more partial-load states + fix possible dangling pointer.
==Fix 2==
We were using `gop` after potentially writing to the map (via a nested call to
mod.evaluate()). We now re-fetch the map entry to be able to safely write to it
Currently, when loading a module, if the module is found in the cache, it's
immediately returned. However, that can result in a timing issue where the
module is cached, but not evaluated, and doesn't have an associated promise.
This commit tries to ensure a module is always evaluated and that the cache
entry has a module promise.
This might fix an crash handler issue. I couldn't reproduce the issue though.
I believe it requires specific timing which is hard to reproduce in a test.
We previously figured that we could release the XHR object as soon as the JS
reference was out of scope. But the callbacks could still exist and thus the
XHR request should proceed.
This commit ensures the XHR instance remains valid so long as we have an active
request.
Might help with https://github.com/lightpanda-io/browser/issues/1448 but I can't
reliably reproduce this, so I'm not 100% sure it resolve the issue. That bug
appears to be caused by some timing interaction between the underlying HTTP
request and the v8 GC.
There's no guarantee that a task will ever be run. A page can be shutdown by
the user or timeout or an error. Scheduler cleanup relies on the underlying
page.arena. This forces all tasks to rely on the page.arena as they have no way
to clean themselves.
This commit allows tasks to register a finalizer which is guaranteed to be
called when the scheduler is shutdown.
The window ScheduleCallback, PostMessageCallback now use an arena from the
ArenaPool rather than the page.arena and use the task finalizer to ensure the
arena is released on shutdown.
V8's inspector world is made up of 4 components: Inspector, Client, Channel and
Session. Currently, we treat all 4 components as a single unit which is tied to
the lifetime of CDP BrowserContext - or, loosely speaking, 1 "Inspector Unit"
per page / v8::Context.
According to https://web.archive.org/web/20210622022956/https://hyperandroid.com/2020/02/12/v8-inspector-from-an-embedder-standpoint/
and conversation with Gemini, it's more typical to have 1 inspector per isolate.
The general breakdown is the Inspector is the top-level manager, the Client is
our implementation which control how the Inspector works (its function we expose
that v8 calls into). These should be tied to the Isolate. Channels and Sessions
are more closely tied to Context, where the Channel is v8->zig and the Session
us zig->v8.
This PR does a few things
1 - It creates 1 Inspector and Client per Isolate (Env.js)
2 - It creates 1 Session/Channel per BrowserContext
3 - It merges v8::Session and v8::Channel into Inspector.Session
4 - It moves the Inspector instance directly into the Env
5 - BrowserContext interacts with the Inspector.Session, not the Inspector
4 is arguably unnecessary with respect to the main goal of this commit, but
the end-goal is to tighten the integration. Specifically, rather than CDP having
to inform the inspector that a context was created/destroyed, the Env which
manages Contexts directly (https://github.com/lightpanda-io/browser/pull/1432)
and which now has direct access to the Inspector, is now equipped to keep this
in sync.
Converting a JS value to a string is a bit messy right now. There's duplication
between string helpers in js.Local, and what js.String and js.Value provide.
Now, all stringifying functions are in js.String, with some helpers in js.Value.
Also tried to streamline the APIs around most common-cases (e.g. js.String ->
[]u8 using call_arena). js.String now also implements format, so it can be
used as-is in some cases.
The ExecutionWorld doesn't do anything meaningful. It doesn't map to, or
abstract any, v8 concepts. It creates a js.Context, destroys the context and
points to the context. Those all all things the Env can do (and it isn't like
the Env is over-burdened as-is).
Plus the benefit of going through the Env is that we can track/collect all
known Contexts for an isolate in 1 place (the Env), which can facilitate things
like context creation/deletion notifications.
Double-freeing should eventually cause a segfault (on ArenaPool.deinit, if not
sooner), but having an explicit check allows us to log the responsible owner.
We're seeing an assertion in Page.appendNew fail because the node has a parent.
According to html5ever, this shouldn't be possible (appendNew is only called
from the Parser). BUT, it's possible we're mutating the node in a way that
we shouldn't...maybe there's JavaScript executing as we're parsing which is
mutating the node.
In release, this will be more defensive. In debug, this still crashes. It's
possible this is valid (like I said, maybe there's JS interleaved which is
mutating the node), but if so, I'd like to know the exact scenario that produces
this case.
Idea with this is to have a key-to-function for known event listeners. We pack pointer to event target with listener type to generate key and set function as value. By doing this, we save bytes for optionally and rarely set functions in elements.
We're seeing cases where known dynamic modules are being requested when the
module isn't compiled yet. It's not clear how this is happening. I believe an
empty cache entry is being created in postCompileModule and then the request
for the dynamic module is happening before the sycnhronous module is loaded.
This seems like the only way to get into this state ,but I can't reproduce it.
Still, we now try to handle this case by simply having the dynamic module
request overwrite the placeholder cache-entry created in postCompileModule.
Unlike XHR, Response is a bit more complicated as it can exist in Zig code
without ever being given to v8. So we need to track this handoff to know who is
responsible for freeing it (zig code, on error/shutdown) or v8 code after
promise resolution.
This also cleansup a bad merge for the XHR finalizer and adds cleaning up the
`XMLHttpRequestEventTarget` callbacks.
Avoids having to allocate small strings when going from v8 -> Zig. Also
added a discriminatory type, string.Global which uses the arena, rather than
the call_arena, if an allocation _is_ necessary. (This is similar to a feature
we had before, but was lost in zigdom). Strings from v8 that need to be
persisted, can be allocated directly v8 -> arena, rather than v8 -> call_arena
-> arena.
I think there are a lot of places where we should use string.String - where
strings are expected to be short (e.g. attribute names). But started with just
document.querySelector and querySelectorAll.
Currently, when you create a Global (Value, Object, Function, ...) it exists
until the context is destroyed.
This PR adds the ability to eagerly free them when they fall out of scope, which
is only possible because of the new finalizer hooks.
Previously, we had js.Value, js.Value.Global; js.Function, js.Function.Global,
etc. This PR introduces a .Temp variant: js.Value.Temp and js.Function.Temp.
This is purely a discriminatory type and it behaves (and IS) a Global. The
difference is that it can be released:
page.js.release(self._on_ready_state_change.?)
Why a new type? There's no guarantee that a global (the existing .Global or the
new .Temp) will get released before the context ends. For this reason, we always
track them in order to free the on context deninit:
```zig
for (self.global_functions.items) |*global| {
v8.v8__Global__Reset(global);
}
```
If a .Temp is eagerly released, we need to remove it from this list. The simple
solution would be to switch `global_functions` from an ArrayList to a HashMap.
But that adds overhead for values that we know we'll never be able to eagerly
release. For this reason, .Temp are stored in a hashmap (and can be released)
and .Globla are stored in an ArrayList (and cannot be released). It's a micro-
optimization...eagerly releasing doesn't have to O(N) scan the list, and we only
pay the memory overhead of the hashmap for values that have a change to be
eagerly freed.
Eager-freeing is now applied to both the callbacn and the values for window
timers (setTimeout, setInterval, RAF). And to the XHR ready_state_change
callback. (we'll do more as we go).
Any object we return from Zig to V8 becomes a v8::Global that we track in our
`ctx.identity_map`. V8 will not free such objects. On the flip side, on its own,
our Zig code never knows if the underlying v8::Object of a global can still be
used from JS. Imagine an XHR request where we fire the last readyStateChange
event..we might think we no longer need that XHR instance, but nothing stops
the JavaScript code from holding a reference to it and calling a property on it,
e.g. `xhr.status`.
What we can do is tell v8 that we're done with the global and register a callback.
We make our reference to the global weak. When v8 determines that this object
cannot be reached from JavaScript, it _may_ call our registered callback. We can
then clean things up on our side and free the global (we actually _have_ to
free the global).
v8 makes no guarantee that our callback will ever be called, so we need to track
these finalizable objects and free them ourselves on context shutdown. Furthermore
there appears to be some possible timing issues, especially during context shutdown,
so we need to be defensive and make sure we don't double-free (we can use the
existing identity_map for this).
An type like XMLHttpRequest can be re-used. After a request succeeds or fails,
it can be re-opened and a new request sent. So we also need a way to revert a
"weak" reference back into a "strong" reference. These are simple v8 calls on
the v8::Global, but it highlights how sensitive all this is. We need to mark
it as weak when we're 100% sure we're done with it, and we need to switch it to
strong under any circumstances where we might need it again on our side.
Finally, none of this makes sense if there isn't something to free. Of course,
the finalizer lets us release the v8::Global, and we can free the memory for the
object itself (i.e. the `*XMLHttpRequest`). This PR also adds an ArenaPool. This
allows the XMLHTTPRequest to be self-contained and not need the `page.arena`.
On init, the `XMLHTTPRequest` acquires an arena from the pool. On finalization
it releases it back to the pool. So we now have:
- page.call_arena: short, guaranteed for 1 v8 -> zig -> v8 flow
- page.arena long: lives for the duration of the entire page
- page.arena_pool: ideally lives for as long as needed by its instance (but no
guarantees from v8 about this, or the script might leak a lot of global, so worst
case, same as page.arena)
When a type is finalized by V8, it's because it's fallen out of scope. When a
type is finalized by Zig, it's because the Context is being shutdown.
Those are two different environments and might require distinct cleanup logic.
Specifically, a zig-initiated finalization needs to consider that the page and
context are being shutdown. It isn't necessarily safe to execute JavaScript at
this point, and thus, not safe to execute a callback (on_error, on_abort,
ready_state_change, ...).
Any object we return from Zig to V8 becomes a v8::Global that we track in our
`ctx.identity_map`. V8 will not free such objects. On the flip side, on its own,
our Zig code never knows if the underlying v8::Object of a global can still be
used from JS. Imagine an XHR request where we fire the last readyStateChange
event..we might think we no longer need that XHR instance, but nothing stops
the JavaScript code from holding a reference to it and calling a property on it,
e.g. `xhr.status`.
What we can do is tell v8 that we're done with the global and register a callback.
We make our reference to the global weak. When v8 determines that this object
cannot be reached from JavaScript, it _may_ call our registered callback. We can
then clean things up on our side and free the global (we actually _have_ to
free the global).
v8 makes no guarantee that our callback will ever be called, so we need to track
these finalizable objects and free them ourselves on context shutdown. Furthermore
there appears to be some possible timing issues, especially during context shutdown,
so we need to be defensive and make sure we don't double-free (we can use the
existing identity_map for this).
An type like XMLHttpRequest can be re-used. After a request succeeds or fails,
it can be re-opened and a new request sent. So we also need a way to revert a
"weak" reference back into a "strong" reference. These are simple v8 calls on
the v8::Global, but it highlights how sensitive all this is. We need to mark
it as weak when we're 100% sure we're done with it, and we need to switch it to
strong under any circumstances where we might need it again on our side.
Finally, none of this makes sense if there isn't something to free. Of course,
the finalizer lets us release the v8::Global, and we can free the memory for the
object itself (i.e. the `*XMLHttpRequest`). This PR also adds an ArenaPool. This
allows the XMLHTTPRequest to be self-contained and not need the `page.arena`.
On init, the `XMLHTTPRequest` acquires an arena from the pool. On finalization
it releases it back to the pool. So we now have:
- page.call_arena: short, guaranteed for 1 v8 -> zig -> v8 flow
- page.arena long: lives for the duration of the entire page
- page.arena_pool: ideally lives for as long as needed by its instance (but no
guarantees from v8 about this, or the script might leak a lot of global, so worst
case, same as page.arena)
This commit fix the use after free crash into inspector contextCollected
run in the pumpMessageLoop.
Removing a context linked to an inspector triggers a contextCollected
task in the message queue.
But if the contextCollected task run after the GC it try to use free
memory. Forcing the message loop to run before the GC fix the issue.
This puppeteer script [likely will] crash the brower:
page.goto(baseURL + '/campfire-commerce/');
await page.goto(baseURL + '/campfire-commerce/');
The quick double-navigation means parse_state remains .pre and thus the page
isn't reset. We introduce a new load_state, .waiting, which can be used to
detect this state.
In some cases, it's straightforward to know whether or not there's an implicit
local scope available. But both Navigation and ReadableStream* have more complex
flows. Both could, in theory, be initiated from non-V8 calls, so relying on
the implicit scope isn't safe.
This adds an explicit scope to most callbacks in Navigation and ReadbleStream*.
However, I still don't quite understand how / if these are being initiated from
Zig currently (I could see how they would be, in the future). Therefore, in
debug mode, this will still panic if there's no implicit scope, because I want
to understand what's going on.
This was already being handled for async scripts, but for sync scripts, we'd
log the error then proceed to try and execute the body (which would be some
error message).
This allows the header_callback to return a boolean to indicate whether or not
the http client should continue to process the request or abort it.
We currently force-write a simple HTML doctype for HTMLDocument. But if the
document already has a doctype, that results in us writing the forced one then
writing the correct one. This adds a check and only force-writes a doctype if
the first child of the document isn't a document_type node.
Was added here 43805ad698
But causes segfaults. The issue is hard to understand. At first, it seemed like
the value cached in a v8::Object was persisting through v8::contexts of the
same isolate. Set window.document to the current &document, and in a different
context, it retrieves that cached value (which is now an invalid pointers).
However, upon further investigation, this appears to be limited to a mix of
navigation (which causes a new context to be created, and old values to be
invalidated) + Inspector which continues to send commands to the old context.
Since contextDestroyed is something we're aware of and planning to do shortly,
I think we can disable the cache until that's fixed.
Mem 28MB -> 26MB (currently a 24.1MB)
Time 23 -> 17 (currently at 14)
We've made some memory and performance optimization gains lately. Lowering
these will let us spot incremental changes better.
The idea is that frequently accessed properties, e.g. window.document, can be
cached directly as data properties on the underlying v8::Object, removing the
need for the access call into Zig. This is only used on a handful of properties,
almost all of which are on the Window. It is important that the property be
read-only. For example, window.location cannot be cached this way because
window.location is writable (e.g. window.location.hash = '#blah').
This existed briefly before Zigdom, but was removed as part of the migration.
The implementation has changed. This previously relied on a "postAttach" feature
which no longer exists. It is not integrated in the bridge/callback directly and
lazily applied after the first access.
Previously, we were doing some global setup in the Snapshot, but this was
always being overwritten when creating a context. This meaningless setup in
the snapshot was removed.
The global setup is now done once per isolate, rather than once per context.
Deciding what should be an lp.assert, vs an std.debug.assert, vs a debug-only
assert is a little arbitrary.
debug-only asserts, guarded with an `if (comptime IS_DEBUG)` obviously avoid the
check in release and thus have a performance advantage. We also use them at
library boundaries. If libcurl says it will always emit a header line with a
trailing \r\n, is that really a check we need to do in production? I don't think
so. First, that code path is checked _a lot_ in debug. Second, it feels a bit
like we're testing libcurl (in production!)..why? A debug-only assertion should
be good enough to catch any changes in libcurl.
This adds a crash handler which reports a crash (if telemetry is enabled). On a
crash, this looks for `curl` (using the PATH env), and forks the process to then
call execve. This relies on a new endpoint to be setup to accept the "report".
Also, we include very little data..I figured just knowing about crashes would
be a good place to start.
A panic handler is provided, which override's Zig default handler and hooks
into the crash handler.
An `assert` function is added and hooks into the crash handler. This is
currently only used in one place (Session.zig) to demonstrate its use. In
addition to reporting a failed assert, the assert aborts execution in
ReleaseFast (as opposed to an undefined behavior with std.debug.assert).
I want to hook this into the v8 global error handler, but only after direct_v8
is merged.
Much of this is inspired by bun's code. They have their own assert (1) and
a [more sophisticated] crashHandler (2).
:
(1) beccd01647/src/bun.zig (L2987)
(2) beccd01647/src/crash_handler.zig (L198)
When we create a js.Context, we create the underlying v8.Context and store it
for the duration of the page lifetime. This works because we have a global
HandleScope - the v8.Context (which is really a v8::Local<v8::Context>) is that
to the global HandleScope, effectively making it a global.
If we want to remove our global HandleScope, then we can no longer pin the
v8.Context in our js.Context. Our js.Context now only holds a v8.Global of the
v8.Context (v8::Global<v8::Context).
This PR introduces a new type, js.Local, which takes over a lot of the
functionality previously found in either js.Caller or js.Context. The simplest
way to think about it is:
1 - For v8 -> zig calls, we create a js.Caller (as always)
2 - For zig -> v8 calls, we go through the js.Context (as always)
3 - The shared functionality, which works on a v8.Context, now belongs to js.Local
For #1 (v8 -> zig), creating a js.Local for a js.Caller is really simple and
centralized. v8 largely gives us everything we need from the
FunctionCallbackInfo or PropertyCallbackInfo. For #2, it's messier, because we
can only create a local v8::Context if we have a HandleScope, which we may or
may not.
Unfortunately, in many cases, what to do becomes the responsibility of the caller
and much of the code has to become aware of this local-ness. What does it means
for our code? The impact is on WebAPIs that store .Global. Because the global
can't do anything. You always need to convert that .Global to a local
(e.g. js.Function.Global -> js.Function).
If you're 100% sure the WebAPI is only being invoked by a v8 callback, you can
use `page.js.local.?.toLocal(some_global).call(...)` to get the local value.
If you're 100% sure the WebAPI is only being invoked by Zig, you need to create
`js.Local.Scope` to get access to a local:
```zig
var ls: js.Local.Scope = undefined;
page.js.localScope(&ls);
defer ls.deinit();
ls.toLocal(some_global).call(...)
// can also access `&ls.local` for APIs that require a *const js.Local
```
For functions that can be invoked by either V8 or Zig, you should generally push
the responsibility to the caller by accepting a `local: *const js.Local`. If the
caller is a v8 callback, it can pass `page.js.local.?`. If the caller is a Zig
callback, it can create a `Local.Scope`.
As an alternative, it is possible to simply pass the *Page, and check
`if page.js.local == null` and, if so, create a Local.Scope. But this should only
be done for performance reasons. We currently only do this in 1 place, and it's
because the Zig caller doesn't know whether a Local will actually be needed and
it's potentially called on every element creating from the parser.
This is the first in a series of changes to make globals explicit. The ultimate
goal of having explicit Globals is to move away from the global HandleScope and
to explicit HandleScopes.
Currently, we treat globals and locals interchangeably. In fact, for Global ->
Local, we just ptrCast. This works because we have 1 global HandleScope, which
effectively disables V8's GC and thus nothing ever gets moved.
If we're going to introduce explicit HandleScopes, then we need to first have
correct Globals. Specifically, when we want to act on the global, we need to
get the local value, and that will eventually mean making sure there's a
HandleScope.
While adding explicit globals, we're keeping the global HandleScope so that we
can minimize the change. So, given that we still have the global HandleScope
the change is largely two things:
1 - js.Function.persit() returns a js.Function.Global. Types that persist global
functions must be updated to js.Function.Global.
2 - To turn js.Function.Global -> js.Function, we need to call .local() on it.
The bridge has been updated to support js.Function.Global for both input and
output parameters. Thus, window.setOnLoad can now directly take a
js.Function.Global, and window.getOnLoad can directly return that
js.Function.Global.
This is the first in a series of changes to make globals explicit. The ultimate
goal of having explicit Globals is to move away from the global HandleScope and
to explicit HandleScopes.
Currently, we treat globals and locals interchangeably. In fact, for Global ->
Local, we just ptrCast. This works because we have 1 global HandleScope, which
effectively disables V8's GC and thus nothing ever gets moved.
If we're going to introduce explicit HandleScopes, then we need to first have
correct Globals. Specifically, when we want to act on the global, we need to
get the local value, and that will eventually mean making sure there's a
HandleScope.
While adding explicit globals, we're keeping the global HandleScope so that we
can minimize the change. So, given that we still have the global HandleScope
the change is largely two things:
1 - js.Function.persit() returns a js.Function.Global. Types that persist global
functions must be updated to js.Function.Global.
2 - To turn js.Function.Global -> js.Function, we need to call .local() on it.
The bridge has been updated to support js.Function.Global for both input and
output parameters. Thus, window.setOnLoad can now directly take a
js.Function.Global, and window.getOnLoad can directly return that
js.Function.Global.
TL;DR - use page.arena instead of page.call_arena
This probably comes from copying the implementation of MutationObserver and/or
IntersectionObserver. But those dispatches are different in that they directly
dispatch a slice (e.g. of MutationRecords) which gets mapped to a v8::Array when
doing the callback. The MutationRecords exist on the heap, not in
_pending_records, so the call_arena is fine.
PerformanceObserver returns an Zig object, not a slice. Therefore it gets mapped
to a v8::Object which references the Zig object. The state of that object, the
_entries list, has to exist for the lifetime of that object, not the call_arena.
In order to uses less space and improve the readability.
zig fmt allows only 1 switch case per line or all in one line.
When having a lot of conditions, splitting the line is useful.
Virtually all string are page-owned (in the page.arena), but because of the
Small String Optimization we use (string.zig), a string could be stack-allocated
The correct solution is probably to change the key to be a string.String. But
I want to give more thought to memory in general, and strings specifically need
to be thought about. So this is a quick fix for crashing.
It now returns a Caught struct which contains all information. The Caught struct
can be logged directly, providing more consistent logs for caught errors.
We previously treated v8::Object and v8::Values interchangeably, and would just
ptrCast one to the other. So, if an API was defined with a js.Object but was
given a non-object value, e.g. 9001, it would still work.
This has since been tightened. If an API takes a js.Object, than the v8 value
must be an object. Passing a non-object will result in a InvalidArgument error.
CustomEvent.detail and PerformanceMark.detail can both be any value, so the
apis/fields have been updated from js.Object -> js.Value.
On CDP.BrowserContext.deinit, clear the isolated world ExecutionContext before
terminating the session. This is important as the isolated_world list is
allocated from the session.arena.
Also, semi-revert 63f1c85964. Before all this
we were running microtasks on ExecutionWorld.removeContext. That didn't seem
right (and I thought it was the original source of the bug). But, for the "real"
Page context, this is critical, since Microtasks can reference the Page object.
Since microTasks are isolation-level, it's possible for a microtasks for Page1
to execute after Page1 goes away (if we create a new page, Page2). This re-adds
the microtask "draining", but only for the Page (i.e. in Page.deinit).
This crashes linux in releasesafe without an embedded snapshot. Not sure why,
but it shouldn't be necessary. This was added back when we were executing
microtasks on a schedule, rather than manually at explicit points.
This lets us load the isolate without having to create a temp/dummy context
just to get the templates.
Call ContextDisposedNotification when a context is removed. Supposedly this can
help/hint to the isolate about memory management.
Range mutation will trigger MutationObserver
MutationObserver with characterDataOldValue=true implicitly means
characterData=true
For MutationObserver-characterData test.
Improves dom/nodes/ParentNode-querySelector-escapes.html from 20/68 -> 64/68.
Previous main had 66/68..but the last 4 are really edge cases that add a lot
of complexity.
When registering an object as an event listener, the handleEvent function
doesn't have to be defined then and there. The handleEvent function can be added
at any point in the future.
Instead of being mapped to HTMLUnknownElement, these will all be mapped to the
correct type. This is important for many WPT tests. But it's not impossible that
some script checks `if (x instanceof HTMLBaseElement)` and, without this, that
would error since HTMLBaseElement wouldn't be defined.
While it sorta works if done on the prototype, it's incorrect as these are no
longer "own" properties (which some WPT tests care about). NamedIndexes were
already correctly defined on the instance.
This does a better job of tracking the implicit namespace based on the context.
For example, when using DOMParser.parseFromString with an XML namespace, all
subsequent elements will be in the XML namespace.
Adds support for null namespace.
Rather than defaulting to HTML, unknown namespaces now map to a special unknown
type. We don't currently preserve the original namespace, but we're at least
able to properly handle the casing in this case.
Track this in a lookup on the page, to avoid having to store a pointer for
_every_ node, given that most nodes _are_ owned by the document.
This helps us ensure nodes can be properly adopted.
If 2 elements have the same id then,
1 - The first in document-order has to be retrieved. We were ordering by
insertion order.
2 - When the element is removed, then document.getElementById should return the
next element with that id in document-order. We were returning null
It's possible for Script.errorCallback to be called as part of the call to
`client.request`. This happens because we eagerly pump the libcurl message loop
to get the request going ASAP. For very obvious failures (e.g. an invalid URL)
this means that the error callback can be called from `client.request`.
Previously, we were only adding the script to its list _after_ the call to
`client.request`, but the error handler tries to remove the script from the list
.
This commit changes the order so that the script is first added to the list
and then the request is made.
Previously, `MessageEvent('')` would have been allowed, but invalid. This caused
problems as the receiver was the window. All such calls are now rejected.
Depends on https://github.com/lightpanda-io/zig-v8-fork/pull/131
I thought we eliminated this from being possible, but I guess not. It seems to
be related to multiple async scripts direct or indirectly (or a specific
combination?) waiting for the same synchronous script. The first async script
to block, grabs it and deletes it, making it invalid for the next.
This puts back a counter on blocking scripts which I took out when I thought
this could no longer happen.
The pure zig JSON parser didn't generate the same type of values than JS
JSON.parse command.
Using directly V8's JSON parser gives the assurance to have the right
JS types.
Moreover, it avoid data transformations between Zig and V8.
Zigdom broke request interception. It isn't zigdom specifically, but in zigdom
we properly block the parser when executing a normal (not async, not defer)
script. This does not work well with request interception, because an
intercepted request isn't blocked on HTTP data, it's blocked on a message from
CDP. Generally, neither our Page nor ScriptManager are CDP-aware. And, even if
they were, it would be hard to break out of our parsing and return control to
the CDP server.
To fix this, we expand on the HTTP Client's basic awareness of CDP (via its
extra_socket field). The HTTP client is now able to block until an intercepted
request is continued/aborted/fulfilled. it does this by being able to ask the
CDP client to read/process data.
This does not yet work for intercepted authentication requests.
Add support for both modes - parsing and post-parsing. In post-parsing mode,
document.write implicitly calls document open, and document.open wipes the
document. This mode is probably rarely, if ever, used.
However, while parsing, document.write does not call document.open and does not
remove all existing nodes. It just writes the html into the document where the
parser is. That isn't something we can properly do..but we can hack it. We
create a new DocumentFragment, parse the html into the document fragment, then
transfer the children into the document where we currently are.
Our hack probably doesn't work for some advance usage of document.write (e.g
nested calls), but it should work for more common cases, e.g. injecting a script
tag.
There are two layers here. The first is that, on startup, a v8 SnapshotCreator
is created, and a snapshot-specific isolate/context is setup with our browser
environment. This contains most of what was in Env.init and a good chunk of
what was in ExecutionWorld.createContext. From this, we create a v8.StartupData
which is used for the creation of all subsequent contexts. The snapshot sits
at the application level, above the Env - it's re-used for all envs/isolates, so
this gives a nice performance boost for both 1 connection opening multiple pages
or multiple connections opening 1 page.
The second layer is that the Snapshot data can be embedded into the binary, so
that it doesn't have to be created on startup, but rather created at build-time.
This improves the startup time (though, I'm not really sure how to measure that
accurately...).
The first layer is the big win (and just works as-is without any build / usage
changes).
with snapshot
total runs 1000
total duration (ms) 7527
avg run duration (ms) 7
min run duration (ms) 5
max run duration (ms) 41
without snapshot
total runs 1000
total duration (ms) 9350
avg run duration (ms) 9
min run duration (ms) 8
max run duration (ms) 42
To embed a snapshot into the binary, we first need to create the snapshot file:
zig build -Doptimize=ReleaseFast snapshot_creator -- src/snapshot.bin
And then build using the new snapshot_path argument:
zig build -Dsnapshot_path=../../snapshot.bin -Doptimize=ReleaseFast
The paths are weird, I know...since it's embedded, it needs to be inside the
project path, hence we put it in src/snapshot.bin. And since it's embedded
relative to the embedder (src/browser/js/Snapshot.zig) the path has to be
relative to that, hence ../../snapshot.bin. I'm open to suggestions on
improving this.
Rework node.isConnected(), this now [correctly] returns true as long as a node
is part of _a_ document (it doesn't have to be the 'main' document). This
requires changes around id lookup optimization.
document.embeds, document.plugins, document.anchor, document.getElementsByName
getElementsByClassName support for multiple class names
various document getters
Much better v8 object debugging/printing in debug mode
Window.requestIdleCallback and cancelIdleCallback
Don't prematurely close stream on empty read - queue promises.
Using the `call_arena` here is unsafe in the case of a failure. It's possible
for the call_arena to be reset during module processing, making the log crash.
The issue is that the lifetime of a URL is often conditional. If the stitched
URL has already been seen (i.e. is in the module_cache), then it can be short-
lived. EXCEPT, URL.stitch might require an allocation..and then you start to
think, well, if URL.stitch is going to allocate anyways...If we stitch with
the `page.arena`, and end up not needing a long lifetime, we've wasted memory.
If we stitch with `page.call_arena` and end up needing a long lifetime, we need
to dupe.
It's a bit messy, and I'd like to take a stab at improving it after:
https://github.com/lightpanda-io/browser/pull/1127.
I'm thinking that we need a URL intern pool. HashMap with a composite key of
base + path -> resolved. Then all URLs are resolved using the page.arena, but
we don't have any duplicates, so it isn't wasteful.
In their current implementation, both the NodeIterator and TreeWalker would
skip over ignored nodes. However, while the node itself should have been ignored
its children should still be iterated.
For example, when going over:
```
<div id="container">
<!-- comment1 -->
<span>
<!-- comment2 -->
</span>
</div>
```
With `SHOW_COMMENT`, the previous version would completely skip over `container`
and its children. Now the code still won't emit the `container` div itself,
it will still iterate through its children (and thus emit the two comments).
This change relates to ongoing react compatibility.
This is how, for example, scripts on lightpanda.io are. Though fixing this
doesn't really change anything, it's good to make sure these events are firing
correctly.
First, this exposes the v8 Profiler. Right now it's just a commented-out block
in `fetch` and meant for internal debugging.
Depends on: https://github.com/lightpanda-io/zig-v8-fork/pull/105
Use postAttach on Window to attach "static" properties. This comes from
profiling (lightpanda.io) and seeing window.get_self called tens of thousands
of times.
This prefers `suggestVectorLength` in order to pick a vector size; for cookie strings shorter than, say 64, this might cause it to fallback to slow path on architectures that support larger vector sizes (like AVX-512). We may also add checks for smaller vector sizes if desired in the future.
1 - Always fire the callback on the next tick. This is probably the most
important change, as frameworks like React don't react well if the callback is
fired immediately (they expect to continue processing the page in its current
state, not in the mutated state from the callback)
2 - Always fire the callback for observed elements with a parent, whether or
not those intersect or are connected. From MDN, the callback is fired
"The first time the observer is initially asked to watch a target element."
3 - Add a mutation observer so that if a node is added to the root (or removed)
the callback is fired. This, I think, is the best we can currently do for
"intersection".
When using CDP, we poll the HTTP clients along with the CDP socket. Because this
polling can be long, we first process any pending message. This can end up
processing _all_ messages, in which case the poll will block for a long time.
This change makes it so that when the initial processing processes 1+ message,
we do not poll, but rather return. This allows the page lifecycle to be
processed normally (and not just blocking on poll, waiting for the CDP client
to send data).
1 - On `unkown global property`, include the stack trace (this might be WAY too
verbose)
2 - On script get, include stack trace (when available)
3 - On script get, include referrer
4 - Stack traces will now include the script name/src when available
The call_arena was previously owned by the js.Context, but it has to exist on
the page, and the page is created before the context, so it's set to undefined
on the page. While this has never caused an issue, there's no reason for the
page not to own this, and the context to simply reference it.
Also, renamed the js.Context.context_arena to simply `arena`, which is more
consistent with other arena names (e.g. page.arena).
Adds bortli as a submodules, and compiles the decoder, making it available to
libcurl.
Some websites appear to sent brotli encoded responses even though we don't
advertise support for it (e.g. https://movie.douban.com).
Remove the is_blocking variable (and checks) from the ScriptManager. This is a
holdover to when blocking imports had a dedicated connections, and thus couldn't
be loaded concurrently.
This changes two obvious things (and probably a few subtle ones). The first is
that plain async scripts are now free to be executed as soon as they are
completed. As far as I can tell, this is a safe behavior, is simpler and should
be a bit faster.
More importantly, it allows for chains of imports. normal import (A) -> dynamic
import (B) -> normal import (C). This would previously fail an assertion. The
superficial issue was that dynamic import handling didn't respect the
`is_blocking` flag. But if we did respect it, than module B would never be able
to load module C, which would block module A from ever completing. By removing
the flag, module C will now be properly evaluated, which unblocks module B which
allows module A to unblock.
(1) I believe this is the issue seen here:
https://github.com/lightpanda-io/browser/issues/1121
Currently, this hooks a single log.Interceptor into the logging framework, but
changing it to take a list shouldn't be too hard. Biggest issue is who will own
it, as we'd need an allocator to maintain a list / lookup (which log doesn't
currently have).
Uses logFmt format, and, for now, always filters out debug messages and a few
particularly verbose scopes.
This Pr largely tightens up a lot of the code. 'v8' is no longer imported
outside of js. A number of helper functions have been moved to the js.Context.
For example, js.Function.getName used to call:
```zig
return js.valueToString(allocator, name, self.context.isolate, self.context.v8_context);
```
It now calls:
```zig
return self.context.valueToString(name, .{ .allocator = allocator });
```
Page.main_context has been renamed to `Page.js`. This, in combination with new
promise helpers, turns:
```zig
const resolver = page.main_context.createPromiseResolver();
try resolver.resolve({});
return resolver.promise();
```
into:
```zig
return page.js.resolvePromise({});
```
Renames JsContext -> js.Context, JsObject -> js.Object and JsThis -> js.This
which is more consistent with the other types. The JsObject -> js.Object is
the reason so many files were touched.
This is still a [messy] transition, with more refactoring planned to clean it
up.
Back in the zig-js-runtime days, globals were used for the state and webapi
declarations. This caused problems largely because it was done across
compilation units (using @import("root")...).
The generic Env(S, WebApi) was used to solve these problems, while still making
it work for different States and WebApis.
This change removes the generics and hard-codes the *Page as the state and
only supports our WebApis for the class declarations.
To accommodate this change, the runtime/*tests* have been removed. I don't
consider this a huge loss - whatever behavior these were testing, already
exists in the browser/**/*.zig web api.
As we write more complex/complete WebApis, we're seeing more and more cases
that need to rely on js objects directly (JsObject, Function, Promises, etc...).
The goal is to make these easier to use. Rather than using Env.JsObject, you
now import "js.zig" and use js.JsObject (TODO: rename JsObject to Object).
Everything is just a plain Zig struct, rather than being nested in a generic.
After this change, I plan on:
1 - Renaming the js objects, JsObject -> Object. These should be referenced in
the webapi as js.Object, js.This, ...
2 - Splitting the code across multiple files (Env.zig, Context.zig,
Caller.zig, ...)
scroll alias for scrollTo
add get_scrollX and get_scrollY, along with their aliases: pageXOffset and
pageYOffset. These always return 0, unless scroll or scrollTo are called.
There's always going to be ambiguity between a string and a Uint8Array. We
already had TypedArray(u8) as a discriminator when _returning_ values. But now
the type is also used by mapping JS values to Zig. To support this efficiently
when probing the union, the typed array mapping logic was extracted into its
own function (so that it can be used by the probe).
There's always going to be ambiguity between a string and a Uint8Array. We
already had TypedArray(u8) as a discriminator when _returning_ values. But now
the type is also used by mapping JS values to Zig. To support this efficiently
when probing the union, the typed array mapping logic was extracted into its
own function (so that it can be used by the probe).
This solves two issues. First, it's more correct, the indexers should be live.
Second, it makes sure that anything with an HTMLCollection prototype, like
HTMLOptionsCollection, also gets access to the index getters.
We could solve the 2nd issue by making `postAttach` work up the prototype
chain, but since postAttach is wrong (not live), I prefer this solution.
This changes how non-async module loading works. In general, module loading
is triggered by a v8 callback. We ask it to process a module (a <script type=
module>) and then for every module that it depends on, we get a callback. This
callback expects the nested v8.Module instance, so we need to load it then and
there (as opposed to dynamic imports, where we only have to return a promise).
Previously, we solved this by issuing a blocking HTTP get in each callback. The
HTTP loop was able to continuing downloading already-queued resources, but if
a module depended on 20 nested modules, we'd issue 20 blocking gets one after
the other.
Once a module is compiled, we can ask v8 for a list of its dependent module. We
can them immediately start to download all of those modules. We then evaluate
the original module, which will trigger our callback. At this point, we still
need to block and wait for the response, but we've already started the download
and it's much faster. Sure, for the first module, we might need to wait the same
amount of time, but for the other 19, chances are by the time the callback
executes, we already have it downloaded and ready.
Allows dynamic imports to be loading asynchronously. I know reddit isnt the
best example, since it doesn't fully load, but this reduced the load time from
~7.2s to ~4.8s.
The first time a `slotchange` event is registered, we setup a SlotChangeMonitor
on the page. This uses a global (ugh) MutationEvent to detect slot changes.
We could improve the perfomance of this by installing a MutationEvent per
custom element, but a global is obviously a lot easier.
Our MutationEvent currently fired _during_ the changes. This is problematic
(in general, but specifically for slotchange). You can image something like:
```
slot.addEventListener('slotchange', () => {
// do something with slot.assignedNodes()
});
```
But, if we dispatch the `slotchange` during the MutationEvent, assignedNodes
will return old nodes. So, our SlotChangeMonitor uses the page scheduler to
schedule dispatches on the next tick.
Refactors some of the module loading logic. Both normal modules import and
dynamic module import now share more of the same code - they both go through
the slightly modified `module` function.
Dynamic modules now check the cache first, before loading, and when cached,
resolve the correct promise. This can now happen regardless of the module
loading state.
Also tried to replace some page arenas with call arenas and added some basic
tests for both normal and dynamic module loading.
chromedp expects the nodeId starts to 1.
A start to 0 make it enter in infinite loop b/c it expects the Go's
default int, ie 0, to be nil from a map to stop the loop.
If the 0 index is set, it will loop...
exit_when_done is pretty much a sneaky way to get CDP knowledge into the page.
exit_when_done == true means "this isn't a CDP session".
extra_socket is another sneaky weay to get CDP knowledge into the page. When
we get an `extra_socket` message it means "Return control to the CDP server".
Therefore it should be impossible to get an `extra_socket` message (return to
CDP) when `exit_when_done == true` (this isn't a CDP session).
--noscript is deprecated (warning) and automatically maps to --strip_mode js
--strip_mode takes a comma separated list of values. From the help:
- "js" script and link[as=script, rel=preload]
- "ui" includes img, picture, video, css and svg
- "css" includes style and link[rel=stylesheet]
- "full" includes js, ui and css
Maybe this is overkill, but i sometimes find myself looking --dump outputs over
and over again, and removing noise (like HUGE svgs) seems like a small
improvement.
Following Zig recommendation not to inline except in specific cases, none of
which I think applies to use.
Also, mimalloc.create can't fail (it used to be possible, but that changed a
while ago), so removed its error return.
There is some risk to this change. The first is that I made a mistake. The
other is that one of the APIs that doesn't currently return an error changes
in the future.
If a webapi has a []const u8 parameter, then the page.call_arena is used. This
is our favorite arena to use, but if the string value has a lifetime beyond the
call, it then needs to be duped again (using page.arena).
When a webapi has a Env.String parameter, the page.arena will be used directly
to get the value from V8, removing the need for an intermediary dupe in the
call_arena.
This allows HTMLCollections to be streamlined. They no longer need to dupe the
filter (tag name, class name, ...), which means they can no longer fail. It also
means that we no longer need to needlessly dupe the string literals.
This defaults to Auto, which means it runs when the call stack reaches 0.
It appears that both Node and Deno set this to explicit.
I don't really understand why Auto doesn't work. It says the call stack is the
C++/C callstack, and I don't see what would block the current code from reaching
a depth of 0. Still, we already have explicit calls to performMicrotasksCheckpoint
which ties it holistically with our scheduler, so having it be explicit like
this should...well make it more explicit
This broke a test, but since the tests are being redone in the [fetch PR](https://github.com/lightpanda-io/browser/pull/972) I simply removed the offending one.
Depends on https://github.com/lightpanda-io/libdom/pull/36
The spec says this should be a High Definition timestamp. But browsers avoid
that in order to avoid fingerprinting. By default, FireFox rounds to 2ms (which
is what this PR does).
Previously, the timestamp was seconds, so you'd think: isn't that better? Well,
it's pretty far off the spec and what browsers do, but more importantly, it
crashes our WPT test. If you look at `Event-timestamp-safe-resolution.html`
you'll see that it's trying to find the delta between two timestamps, in an
endless loop (without a loop of many iterations). With second-resolution, it
just takes too long (and crashes..memory).
- Continue to reuse the Browser/Env/Isolate, but no start a new session per test.
- Test http server now properly closes the sendFile fd
- Run WPT in ReleaseMode
- Add --quiet option to WPT and some commented out debug code for dumping v8
memory stats
Like Chrome, the NetworkIdle and NetworkAlmostIdle will only be sent if the
condition (no network requests / <= 2 network requests) holds for at least 500ms
Also merged runHighPriority and runLowPriority as they are now always run
together (but we still only block/wait for high priority tasks).
We previously ignored tasks scheduled more than 5 seconds away. These tasks are
now scheduled on the low priority queue. This means that they won't stop a
page.wait for returning, but they'll still [eventually] be run if page.wait is
called multiple times.
Practically, this means that they'll never be run in `fetch` mode, but they
might be run from CDP if the driver waits.
Make queue names consistent, primary => high_priority, secondary => low_priority
(the same names used by the page)
According to MDN, inline script tags should not have defer/async attributes. But
some do. This ignores those attributes for inline script tags.
(Previously, we were only half ignoring them. We were treating them as inline,
but flagging them as deferred or async, which was causing issues with cleanup)
Fixes: https://github.com/lightpanda-io/browser/issues/1014
ScriptManager should only ever has one in-flight blockingGet. The is_blocking
flag is used to assert this, as well as enforce it in evaluate(). If is_blocking
is true, evaluate() exits.
This doesn't work for async scripts however, as they aren't executed via
evaluate(), but rather execute directly once complete.
This PR changes the execution behavior of async scripts. They are now only
executed in evaluate() (and thus won't execute when is_blocking == true).
However, unlike normal/deferred scripts, async scripts continue to execute in
their completion order (not their declared order).
Fixes https://github.com/lightpanda-io/browser/issues/1016
CDP currently assumes that if we get a page-related notification (like a
request interception, or page lifecycle event), then we must have a session
and page.
But, Target.detachFromTarget can remove the session from the BrowserContext
while still having the page run (I wonder if we should stop the page at this
point??). So, remove these assumptions and make sure we have a page/session
in the handling of page events.
Most CDP drivers have a mechanism to wait for idle network, or an almost idle
network (sometimes called networkIdle2). These are events the browser must emit.
The page will now emit `networkIdle` when we are reasonably sure there's no more
network activity (this requires some slight changes to request interception,
since, I believe, intercepted requests should be considered).
`networkAlmostIdle` is currently _always_ emitted prior to emitting
`networkIdle`. We should tweak this but I can't, at a glance, think of a great
heuristic for when this should be emitted.
Further reducing bouncing between page and server for loop polling. If there is
a page, the page polls. If there isn't a page, the server polls. Simpler.
Fix events.get_timeStamp (was events.get_timestamp, wrong casing).
Rename `newRunner` to `htmlRunner`.
move tests to src/tests (from src/browser/tests). src/runtime and possibly other
parts might want to have html tests too.
Follows up on https://github.com/lightpanda-io/browser/pull/994 and replaces
the jsRunner with a new page.navigation-based test runner.
Currently only implemented for the Window tests, looking for feedback and
converting every existing test will take time - so for a while, newRunner (to be
renamed) will sit side-by-side with jsRunner.
In addition to the benefits outlined in 994, largely around code simplicity and
putting more of the actual code under tests, I think our WebAPI tests
particularly benefit from:
1 - No need to recompile when modifying the html tests
2 - Much better assertions, e.g. you can assert that something is actually an
array, not just a string representation of an array
3 - Ability to test some edge cases (e.g. dynamic script loading)
I've put some effort into testing.js to make sure that, if the encapsulating
zig test passes, it's because it actually passed, not because it didn't run.
For the time being, console tests are removed. I think it's more useful to have
access to the console within tests, than it is to test the console (which is
just a wrapper around log, which is both tested and heavily used).
Depends on https://github.com/lightpanda-io/browser/pull/993
There's currently 3 ways to execute a page:
1 - page.navigate (as used in both the 'fetch' and 'serve' commands)
2 - jsRunner as used in unit tests
3 - main_wpt as used in the WPT runner
Both jsRunner and main_wpt replicate the page.navigate code, but in their own
hack-ish way. main_wpt re-implements the DOM walking in order to extract and
execute <script> tags, as well as the needed page lifecycle events.
This PR replaces the existing main_wpt loader with a call to page.navigate. To
support this, a test HTTP server was added. (The test HTTP server is extracted
from the existing unit test test server, and re-used between the two).
There are benefits to this approach:
1 - The code is simpler
2 - More of the actual code and flow is tested
3 - There's 1 way to do things (page.navigate)
4 - Having an HTTP server might unlock some WPT tests
Technically, we're replacing file IO with network IO i.e. http requests). This
has potential downsides:
1 - The tests might be more brittle
2 - The tests might be slower
I think we need to run it for a while to see if we get flaky behavior.
The goal for following PRs is to bring this unification to the jsRunner.
Code like this:
```
var body: ?*ElementHTML = undefined;
const err = documentHTMLVtable(doc_html).get_body.?(doc_html, &body);
try DOMErr(err);
if (body == null) return null;
return @as(*Body, @ptrCast(body.?));
```
Isn't safe. It assumes that libdom will either return an error, or set body
to null or a value. However, there are cases (specifically for this API) where
libdom returns DOM_NO_ERR without ever setting body. In such cases, we return
a `body` initialized to a random (invalid) value.
This PR replaces the initial value of optional types from undefined to null
(within the libdom wrapper).
The thin mimalloc API is currently defensive around incorrect setup/teardown by
guarding against using/destroying the arena when the heap is null, or creating
an arena when it already exists.
The only time these checks will fail is when the code is wrong, e.g. trying
to use libdom before or after freeing the arena. The current behavior can mask
these errors, plus add runtime overhead.
There's a flaky performance test that I wanted to fix (1). This led to a couple
changes.
1 - Add timestamp() and milliTimestamp() to datetime.zig. Reduce some code
duplication and use better clock_ids where available
2 - Change Performance API to use milliTimestamp and store a u64 instead of a
f64. While the spec says a float, Firefox deals with u64 and implicit
conversion is always available. Makes our APIs simpler.
(1) - https://github.com/lightpanda-io/browser/actions/runs/17313296490/job/49151366798#step:4:131
Rather than stack-allocating MAX_MESSAGE_SIZE upfront, we now allocate 32KB
and grow the buffer as needed for larger messages, up to MAX_MESSAGE_SIZE.
This will reduce memory usage for drivers that don't send huge payloads (like
playwright does).
While not implemented, this would also enable us to set the MAX_MESSAGE_SIZE
at runtime (e.g. via a command line option).
Removes optional platform, which only existed for tests.
There is now a global `@import("testing.zig").test_app` available. This is setup
when the test runner starts, and cleaned up at the end of tests. Individual
tests don't have to worry about creating app, which I assume was the reason I
Platform optional, since that woul dhave been something else that needed to be
setup.
This should load the "src.js":
```
const s = document.createElement('script');
document.getElementsByTagName('body')[0].appendChild(s);
s.src = "src.js"
```
Notice that src is set AFTER the element is added to the DOM. This PR enables
the above, by
1 - skipping dynamically added scripts which don't have a src
2 - trying to load a script whenever `set_src` is called.
(2) is safe because the ScriptManager already prevents scripts from being
processed multiple times.
Additionally, not only can the src be set after the script is added to the DOM,
but onload and onerror can be set after the src:
```
s.src = "src.js"
s.onload = ...;
s.onerror = ...;
```
This PR also delays reading the onload/onerror callbacks until the script is
done loading.
This behavior is seen on reddit.
Previously, the IO loop was doing three things:
1 - Managing timeouts (either from scripts or for our own needs)
2 - Handling browser IO events (page/script/xhr)
3 - Handling CDP events (accept, read, write, timeout)
With the libcurl merge, 1 was moved to an in-process scheduler and 2 was moved
to libcurl's own event loop. That means the entire loop code, including
the dependency on tigerbeetle-io existed for handling a single TCP client.
Not only is that a lot of code, there was also friction between the two loops
(the libcurl one and our IO loop), which would result in latency - while one
loop is waiting for the events, any events on the other loop go un-processed.
This PR removes our IO loop. To accomplish this:
1 - The main accept loop is blocking. This is simpler and works perfectly well,
given we only allow 1 active connection.
2 - The client socket is passed to libcurl - yes, libcurl's loop can take
arbitrary FDs and poll them along with its own.
In addition to having one less dependency, the CDP code is quite a bit simpler,
especially around shutdowns and writes. This also removes _some_ of the latency
caused by the friction between page process and CDP processing. Specifically,
when CDP now blocks for input, http page events (script loading, xhr, ...) will
still be processed.
There's still friction. For one, the reverse isn't true: when the page is
waiting for events, CDP events aren't going to be processed. But the page.wait
already have some sensitivity to this (e.g. the page.request_intercepted flag).
Also, when CDP waits, while we will process network events, page timeouts are
still not processed. Because of both these remaining issues, we still need to
jump between the two loops - but being able to block on CDP (even for a short
time) WITHOUT stopping the page's network I/O, should reduce some latency.
chromedb doesn't support duplicate header names. Although servers _will_ send
this (e.g. Cache-Control: public\r\nCache-Control: max-age=60\r\n), Chrome
seems to join them with a "\n". So we do the same.
A note on curl_easy_nextheader, which this code ultimately uses to iterate
and collect the headers. The documentation says:
Applications must copy the data if they want it to survive subsequent API
calls or the life-time of the easy handle.
As-is, I'd understand this to mean that a given header name/value is only
valid until any API call, including another call to curl_easy_nextheader. So,
from this comment, we _should_ be duping the name/value. But we don't. Why?
Because, despite the note in the documentation, this doesn't appear to be how
it actually works, nor does it really make sense. If it's just a linked list,
there's no reason curl_easy_nextheader should invalidate previous results. I'm
guessing this is just a general lack of guarantee libcurl is willing to make re
lifetimes.
https://github.com/lightpanda-io/browser/issues/966
Minor improvement to correctness of TreeWalker.
Fun fact, this is the first time, that I've run into, where we have to default
null and undefined to different values.
Also, tweaked the WPT test runner. WPT test results use | as a field delimiter.
But a WPT test (and, I assume a message) can contain a |. So we had at least
a few tests that were being reported as failed, only because the result line
was weird / unexpected. No great robust way to parse this, but I changed it
to look explicitly for |Pass or |Fail and use those positions as anchor points.
Add response_data event, CDP now captures the full body so that it can respond
to the Network.getResponseBody. This isn't memory efficient, but I don't see
another way to do it. At least this way, it's only capturing/storing every
response body when (a) CDP is used and (b) Network.enabled is called. That is,
as opposed to baking this into Http/Client.zig, which would force the memory
consumption for all use-cases.
There's arguably some optimizations we could make for XHR requests, which also
dupe/own the response. As of now, the response is dupe'd separately for CDP
and XHR.
For text content type (and application/json) we create a pseudo HTML
tree with the text value in a <pre> tag.
It allows CDP clients to interact with text content easily.
With networking enabled, CDP listens to this event and emits a
`Network.loadingFinished` event. This is event is used by puppeteer to know that
details about the response (i.e. the body) can be queries.
Added dummy handling for the Network.getResponseBody message. Returns an
empty body. Needed because we emit the loadingFinished event which signals
to drivers that they can ask for the body.
The callback which was called on a per-header basis is removed. Only XHR was
using this, and it was created before the HeaderIterator existed (because I
didn't know we could iterate through the response headers in curl after the fact).
The header_done_callback remains, but is now called header_callback (a bit
confusing in the short term).
The only difficulty was with fulfilled requests, which do not have an easy
handle for our HeaderIterator. The existing code would segfault if
transfer.responseHeaderIterator() was called on a fulfilled requests.
The HeaderIterator is now a tagged union that abstracts whether the source of
the response header is a curl easy, or just an injected list from the fulfilled
requests.
Our build.zig is using a number of deprecated features, which are removed in
0.15.
This updates build.zig so that it still works in 0.14 and [hopefully] will work
in 0.15.
Related: https://github.com/lightpanda-io/zig-v8-fork/pull/87
Stream (to json) the Transfer as a request and response object in the various
network interception-related events (e.g. Network.responseReceived).
Add a page.request_intercepted boolean flag for CDP to signal the page that
requests have been intercepted, allowing Page.wait to prioritize intercept
handling (or, at least, not block it).
Enabling Curl cookie engine brings advantage:
* handle cookies during a redirection: when a srv redirects including
cookies, curl sends back the cookies correctly during the next request
On client.request(req) we now immediately wrap the request into a Transfer. This
results in less copying of the Request object. It also makes the transfer.uri
available, so CDP no longer needs to std.Uri(request.url) anymore.
The main advantage is that it's easier to manage resources. There was a use-
after free before due to the sensitive nature of the tranfer's lifetime. There
were also corner cases where some resources might not be freed. This is
hopefully fixed with the lifetime of Transfer being extended.
The previous implementation just checked if a node had a parent. But it should
check the node has a document ancestor:
```
const d1 = document.createElement('div');
document.createElement('div').appendChild(d1);
d1.isConnected
```
should be `false`.
In addition to this fix, also added support for DocumentFragments which are
part of the ShadowRoot. This, like events, is one of those apis that DOES break
the ShadowRoot isolation.
This ensures that page.wait won't unblock too early. As-is, this isn't an issue
since active can only be 0 if there are no active OR pending requests. However,
with request interception (https://github.com/lightpanda-io/browser/pull/930)
it's possible to have no active requests and no pending requests - from the
http client's point of view - but still have pending-on-intercept requests.
An alternative to this would be to undo these changes, and instead change
Page.wait to be intercept-aware. That is, Page.wait would continue to block on
http activity and scheduled tasks, as well as intercepted requests. However,
since the Page doesn't know anything about CDP right now, and it does know
about the http client, maybe doing this in the client is fine.
This is necessary because of CloudFront which will send gzip content even if
we don't ask for it.
Properly handle scripts that are both async and defer.
Add a helper to print state of page wait. This can be helpful in identifying
what's causing the page to hang on page.wait.
Allow page.wait to transition page mode.
Optimize initial page load. No point running scheduler until the initial
page is loaded.
Support ISO-8859-1 charset
Aims to improve compatibility for the lit framework (e.g. what Reddit is using).
1 - Adds support for adoptedStyleSheets to the Document and ShadowRoot
2 - Adds mock support for replace and replaceSync to the CSSStyleSheet
3 - Optionally include shadowroot in dump
4 - Special-case setting innerHTML on a TemplateElement
libdom's parsing is now less strict with respect to attribute names. See:
https://github.com/lightpanda-io/libdom/pull/33
However, the attribute name in setAttribute has stricter validation applied to
it, which we now handle directly.
The only functionality change is adding a `named_set` to the CSSStyleDeclaration
so that styles can be set (`named_get` was already defined)
Combine the StringHashMapUnmanaged + ArrayListUnmanaged into a single
StringArrayHashMapUnmanaged.
Use file structs, because @import("css_style_declaration.zig").CSSStyleDeclaration
is a bit tedious.
Various micro-optimization around parsing CSS, e.g. ascii.eqlIgnoreCase in loops
replaced by 1 lowercase + N*mem.eql.
This is hacky, but it's inspired by how NetSurf does it. While the Window isn't
the parent of the Document, many events should bubble from the Document to the
Window. libdom simply doesn't handle this (it has no concept of a Window, and
the Document has no parent).
We potentially need to do this for multiple event types (NetSurf only does it
for the 'load' event as far as I can tell). It would be nice to find a generic
way to do this...maybe intercept any addEventListener on the body and
registering special events on the Window? For now, `DOMContentLoaded` is the
blocking (for finance.yahoo.com) and we can see if this is really an issue for
other event types.
These are dummy implementations, but they do expose the ready and finished
promise, and do resolve the finished promise, so it should unblock basic cases.
Currently, fetch --dump includes <script> tag (either inline or with src). I
don't know what use-case this is the desired behavior. Excluding them, via the
new --noscript option has benefit that if you --dump --noscript and open the
resulting page in the browser, you don't re-execute JavaScript, which is
likely to break the page.
For example, opening a --dump of github makes it look like the page is broken
because it re-executes JavaScript that isn't meant to be re-executed.
Similarly, opening a --dump in a browser might execute JavaScript that
lightpanda browser failed to execute, making it looks like it worked better
than it did.
If it wasn't for the fact that the HTTP client is likely going to see a major
refactor, it would definitely be time to create a specific state instance for
synchronous requests.
go-rod appears to stop processing when it receives an error, such as
UnknownMethod. Added placeholder handlers for Network.setUserAgentOverride and
Page.stopLoading.
Setting a custom user agent is something still being discussed, so no-oping it
seems reasonable. And, due to the currently synchronous nature of the initial
page load, no-oping stopLoading also seems reasonable.
https://github.com/lightpanda-io/browser/issues/867
This is always run, but only the full webcomponents polyfill, it's very
small and isn't intrusive. This introduces a layer of indirection so that,
if the full polyfill is loaded, its monkeypatched constructor will be called
I couldn't find where the behavior is described. AND, browsers seem to behave
differently depending on the state of the page (blank document vs actual page).
Still, some sites use innerHTML to load <script> tags, and, in libdom at least,
these are created in the implicit head. We cannot just copy the body nodes. To
keep it simple, I now copy all head and body elements.
libdom's hasAttributes is based on the type. Elements, according to libdom,
always have attributes, thus hasAttributes always return true, even when the
element in question has no attribute. Change our _hasAttributes to only return
true if the attribute count > 0.
Currently seems to always return null. Doesn't seem to be a way in libdom to
change this. The property is deprecated, and MDN recommends using location.host
instead, so change document.get_domain to wrap location.host.
Prototype resolution of Zig types previously had 2 limitations (bug?). The first
was that the Zig prototype chain could only be 1 deep. You couldn't do A->B->C
where each of those was a Zig type (but you could do A->B->C->D->E ... so long
as every other type was a C opaque value).
The other limitation was that Zig prototypes only worked when the nested field
was directly embedded in the struct (i.e. not a pointer). So you could do:
```zig
const X = struct {
proto: XParent,
};
```
But not:
```zig
const X = struct {
proto: *XParent,
};
```
This addresses both limitations. The first issue is solved by keeping track
of the cumulative offset
Only when setAttribute is called directly on the element, does libdom raise
a `DOMAttrModified` event (which MutationObserver uses).
From what I can tell, libdom's element set attribute _does_ rely on the
underlying attribute set value, so the behavior should be pretty close, it just
does extra things on top of that.
Previously, MutationObserver callbacks where called using the `jsCallScopeEnd`
mechanism. This was slow and resulted in records split in a way that callers
might not expect. `jsCallScopeEnd` has been removed.
The new approach uses the loop.timeout mechanism, much like a window.setTimeout
and only registers a timeout when events have been handled. It should perform
much better.
Exactly how MutationRecords are supposed to be grouped is still a mystery to me.
This new grouping is still wrong in many cases (according to WPT), but appears
slightly less wrong; I'm pretty hopeful clients don't really have hard-coded
expectations for this though.
Also implement the attributeFilter option of MutationObserver. (Github)
document.createElement('a').host or .href or .. should return an empty string.
However, URL.constructor(document.createElement('a')) should fail.
Because HTMLAnchorElement uses URL.constructor, we have the wrong behavior.
This adds a guard for an empty anchor. This might not cover all of the cases
which are valid for an anchor but invalid for a URL.constructor, but it's
the most common.
I think we initially thought we might need different clients for different
parts of the system, each with a unique loop (e.g. we thought telemetry might
need some isolation). But that never happened, so it's just needless now,
especially since the async connect uses the non-generic *Loop type directly.
Deal with non-node current target crashing. Builds ontop of abort_signal, but
with the new event-target specific union, I think this will work in for all
future cases.
Don't error on JSON.stringify failure (likely caused by circular reference).
In debug mode, try to print [slightly] more meaningful value representation
when default serialization results in [object Object].
Some apis want a value or undefined. For these, we can't use an Optional
return type, null maps to JS null. Adds an Env.UndefinedOr(T) generic
union for such return types.
Was seeing pretty frequent TLS errors on reddit. I think I had the wrong max
TLS record size, but figured this was an opportunity to upgrade tls.zig, which
has seen quite a few changes since our last upgrade.
Specifically, the nonblocking TLS logic has been split into two structs: one
for handshaking, and then another to be used to encrypt/decrypt after the h
andshake is complete. The biggest impact here is with respect to keepalive,
since what we want to keepalive is the connection post-handshake, but we don't
have this object until much later.
There was also some general API changes, with respect to state and partially
encrypted/decrypted data which we must now maintain.
Root modules (non-cacheable) should register their module_id -> URL so that,
if they load a nested module, we can get the full URL of the nested module.
Crypto.getRandomValues should mutate the given parameter as well as return
the value. This return value must be the same (JsObject) as the input parameter.
There might be more magical ways to solve this, but I opted for both the
simplest and most flexible: adding a `toZig` function to JsObject which does
what js.zig does internally when mapping js values to Zig (and, of course, it
uses the same code).
This allows a caller to receive a JsObject (not too common, but we already do
that in a few places) and return that same JsObject (again, not too common, but
we do have support for returning JsObject directly already). With the main
addition that the JsObjet can now be turned into a Zig type by the caller.
An empty struct will share the same address as its sibling (1) which will cause
an collision in the identity map.
(1) - This depends on Zig's non-guaranteed layout, so the collision might not
be with its sibling, but rather some other [seemingly random] field.
In https://github.com/lightpanda-io/browser/pull/798 module caching was added.
This was necessary as the same module loaded multiple time should result in the
same v8 module instance.
To make this work, modules became cached by their full URL. The full URL of one
module was also used to determine the full URL of nested modules (full url +
specifier).
With inline scripts, the page URL was used as the full URL. While this is
correct when resolving nested modules, it's incorrect for caching the module
itself. Two inline modules on a page share the same URL, but they aren't the
same and should be cached.
To fix this, inline modules still inherit the page URL, in order to resolve the
correct URL for nested modules, but are themselves never cached.
Currently, a timeout that sets a timeout can cause loop.run to block forever.
Also, cleanup URL stitch with leading './'. The resulting URL was valid, but
since we use the URL as the module cache key, it's important to resolve various
representations of the same URL in the same way.
When V8 calls the ResolveModuleCallback that we give it, it passes the specifier
which is essentially the string given to `from`:
```
import {x} from './blah.js';
```
We were taking that specifier and giving it to the page. The page knew the
currently executing script, an thus could resolve the full URL. Given the full
URL, it could either return the JS content from its module cache or fetch
the source.
At best though, this isn't efficient. If two files import the same module, yes
we cache the src, but we still ask v8 to re-compile it. At worse, it crashes
due to resource exhaustion in the case of cyclical dependencies.
ResolveModuleCallback should instead detect that it has already loaded the
module and return the previously loaded module. Essentially, we shouldn't be
caching the JavaScript source, we should be caching the v8 module.
However, in order to do this, we need more than the specifier, which might only
be a relative path (and thus isn't unique). So, in addition to a module cache,
we now also maintain an module identifier lookup. Given a module, we can get
its full path. Thankfully ResolveModuleCallback gives us the referring module,
so we can look up that modules URL, stitch it to the specifier, and get the
full url (the unique identifier) within the JS runtime.
Need more real world testing, and a fully working example before I celebrate,
but for sites with many import, this appears to improve performance by many
orders of magnitude.
Adds a dummy PerformanceObserver. Only the supportedEntryTypes static attribute
is supported, and it currently returns an empty array. This hopefully prevents
code from trying to use it. For example, before using it, reddit checks if
specific types are supported and, if not, doesn't use it.
This introduced complexity in the js runtime. Our current approach to
attributes only works with primitive types. Non-primitive types can't be
attached to a FunctionTemplate (v8 will crash saying only primitive types can
be set). Plus, all non primitive types require a context to create anyways.
We now detect "primitive" attributes and "complex" attributes. Primitive
attributes are setup as before. Complex attributes are setup per-context,
requiring another loop through our types to detect & setup on each context
creation.
When EventTargetTBase is used, we pass the container as the target to libdom.
This is not safe, as libdom is expecting an event_target. We see, for example
that when _dom_event_get_current_target is called, the refcnt is increased.
This works if the current_target is a valid event_target, but if it's a
Zig instance (like the Window) ... we're just altering some bits of the
window instance.
This attempts to add a dummy target to EventTargetTBase which can acts as a
real event_targt in place of the Zig instance.
We currently set request._keepalive prematurely. There are [error cases] where
the request could be abandoned before being fully drained. While we do try to
drain in some cases, it isn't always possible. For this reason,
request.keepalive is only set at the end of the request lifecycle, at which
point we know the connection is ready to be re-used.
Probing a union match for an possible value should rarely hard-fail. Instead,
it should return an .{.invalid = {}} response to let the prober decide how to
proceed. This fixes a hard-fail when a JS value fails to probe as an array.
Also, add :modal pseudo-class.
(both issues came up looking at github integration)
Test speed has been improved only slightly by tweaking a 2-second running tests.
Build has been improved by:
1 - moving logFunctionCallError out of js.Caller and to a standalone function
2 - removing some non-generic code from the generic portions of the logger
Caller.getter and Caller.setter have been removed in favor or calling
Caller.method. This wasn't previously possible - prior to our v8 upgrade, they
had different signatures.
Also removed a largely unused parser/str.zig file.
Libdom's formGetCollection doesn't work (like I would expect) for dynamically
added elements.
For example, given:
```
let el = document.createElement('input');
document.getElementsByTagName('form')[0].append(el);
```
(and assume the page has a form), I'd expect `el.form` to be equal to the form
the input was added to. Instead, it's null. This is a problem given that
`dom_html_form_element_get_elements` uses the element's `form` attribute to
"collect" the elements.
This uses our existing querySelector to find the form elements.
Test speed has been improved only slightly by tweaking a 2-second running tests.
Build has been improved by:
1 - moving logFunctionCallError out of js.Caller and to a standalone function
2 - removing some non-generic code from the generic portions of the logger
Caller.getter and Caller.setter have been removed in favor or calling
Caller.method. This wasn't previously possible - prior to our v8 upgrade, they
had different signatures.
Also removed a largely unused parser/str.zig file.
Probing a union match for an possible value should rarely hard-fail. Instead,
it should return an .{.invalid = {}} response to let the prober decide how to
proceed. This fixes a hard-fail when a JS value fails to probe as an array.
Also, add :modal pseudo-class.
(both issues came up looking at github integration)
Libdom's formGetCollection doesn't work (like I would expect) for dynamically
added elements.
For example, given:
```
let el = document.createElement('input');
document.getElementsByTagName('form')[0].append(el);
```
(and assume the page has a form), I'd expect `el.form` to be equal to the form
the input was added to. Instead, it's null. This is a problem given that
`dom_html_form_element_get_elements` uses the element's `form` attribute to
"collect" the elements.
This uses our existing querySelector to find the form elements.
In https://github.com/lightpanda-io/browser/pull/767 I tried to call loop.run
from within a loop.run (spoiler, it didn't work), in order to make sure aborted
connections were properly cleaned up before starting a new navigation. That
resulted in having loop.run no longer wait for timeouts for fear of having to
wait on a long timeout. The ended up breaking page.wait (used in the fetch
command).
This commit brings back the original behavior where loop.run() waits for all
completions. Which is now safe to do since the nested loop.run() call has
been removed.
This is one of the ways that puppeteer knows that navigation happened
and is needed to support `waitForNavigation` which compares the
existing loader-id with the new one, so it has to change.
Also, fix a crash that could happen if CDP disconnects while
connections are being aborted.
Currently, if there's an internal navigation event, we continue to process the
page normally. This has negative performance implication, and can result in
user-scripts producing unexpected results.
For example, imagine a page with a script that does.
```js
if (x) {
form.submit();
}
reloadProduct();
```
The call to `form.submit()` should stop the script from executing. And, if the
page has 10 other <script> tags after this, they shouldn't be loaded nor
executed.
This code terminates the execution of the current script on an internal
navigation event, and stops the rest of the page from load.
While I believe this creates a more "correct" behavior, it also introduces new
edge cases. There's no a period of time, between the termination being stopped
and then being resumed, where executing code is not safe.
The mix of sync and async HTTP requests requires care to avoid deadlocks.
Previously, it was possible for async requests to use up all available HTTP
state objects duration a navigation flow (either directly, or via an internal
redirect (e.g. click, submit, ...)). This would block the navigation, which,
because everything is single thread, would block the I/O loop, resulting in a
deadlock.
The correct solution seems to be to remove all synchronous I/O. And I tried to
do that, but I ran into a wall with module-loading, which is initiated from V8.
V8 says "give me the source for this module", and I don't see a great way to
tell it: wait a bit.
So I went back to trying to make this work with the hybrid model, despite last
weeks failures to get it to work. I changed two things:
1 - The http client will only directly initiate an async request if there's
at least 2 free state objects available (1 for the request, and leaving 1
free for any synchronous requests)
2 - Delayed navigation retries until there's at least 1 free http state object
available.
Commits from last week did help with this. First, we're now guaranteed to have
a single sync-request at a time (previously, we could have had 2). Secondly,
the async connection is now async end-to-end (previously, it could have blocked
on an empty state pool).
We could probably make this a bit more obviously by reserving 1 state object
for synchronous requests. But, since the long term solution is probably having
no synchronous requests, I'm happy with anything that lets me move past this
issue.
Support CDP's Input.dispatchKeyEvent and DOM key events. Currently only
keydown is supported and expects every key to be a displayable character.
It turns out that manipulating the DOM via key events isn't great because the
behavior really depends on the cursor. So, to do this more accurately, we'd
have to introduce some concept of a cursor.
Personally, I don't think we'll run into many pages that are purposefully
using keyboard events. But driver (puppeteer/playwright) scripts might be
another issue.
CDP translate this into a Network.loadingFailed. This is necessary to make sure
every Network.requestWillBeSent is paired with either a Network.loadingFailed
or a Network.responseReceived.
2025-06-06 19:15:47 +08:00
782 changed files with 131196 additions and 47042 deletions
First, get the tools necessary for building V8, as well as the V8 source code:
```
make get-v8
```
Next, build v8. This build task is very long and cpu consuming, as you will build v8 from sources.
```
make build-v8
```
For dev env, use `make build-v8-dev`.
See [#1279](https://github.com/lightpanda-io/browser/pull/1279) for more details.
## Test
@@ -270,35 +269,75 @@ make end2end
Lightpanda is tested against the standardized [Web Platform
Tests](https://web-platform-tests.org/).
The relevant tests cases are committed in a [dedicated repository](https://github.com/lightpanda-io/wpt) which is fetched by the `make install-submodule` command.
All the tests cases executed are located in the `tests/wpt` sub-directory.
We use [a fork](https://github.com/lightpanda-io/wpt/tree/fork) including a custom
// The closest() method of the Element interface traverses the element and its parents (heading toward the document root) until it finds a node that matches the specified CSS selector.
// Returns the closest ancestor Element or itself, which matches the selectors. If there are no such element, null.
// Since we are lazy rendering we need to do this check. We could store the renderer in a viewport such that it could cache these, but it would require tracking changes.
// Returns the bounds rectangle of the target element as a DOMRectReadOnly. The bounds are computed as described in the documentation for Element.getBoundingClientRect().
// Returns the topmost Element at the specified coordinates (relative to the viewport).
// Since LightPanda requires the client to know what they are clicking on we do not return the underlying element at this moment
// This can currenty only happen if the first pixel is clicked without having rendered any element. This will change when css properties are supported.
// This returns an ElementUnion instead of a *Parser.Element in case the element somehow hasn't passed through the js runtime yet.
// TODO if pointer-events set to none the underlying element should be returned (parser.documentGetDocumentElement(self.document);?)
returntryElement.toInterface(element);
}
// Returns an array of all elements at the specified coordinates (relative to the viewport). The elements are ordered from the topmost to the bottommost box of the viewport.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.