Commit Graph

161 Commits

Author SHA1 Message Date
Nikolay Govorov
45fca49329 Implement multi CDP connections 2026-01-21 20:32:06 +00:00
Karl Seguin
a6e7ecd9e5 Move more asserts to custom asserter.
Deciding what should be an lp.assert, vs an std.debug.assert, vs a debug-only
assert is a little arbitrary.

debug-only asserts, guarded with an `if (comptime IS_DEBUG)` obviously avoid the
check in release and thus have a performance advantage. We also use them at
library boundaries. If libcurl says it will always emit a header line with a
trailing \r\n, is that really a check we need to do in production? I don't think
so. First, that code path is checked _a lot_ in debug. Second, it feels a bit
like we're testing libcurl (in production!)..why? A debug-only assertion should
be good enough to catch any changes in libcurl.
2026-01-19 09:12:16 +08:00
Pierre Tachoire
fbe07836f9 cdp: return a valide response for Page.getFrameTree on STARTUP
Stagehand expects a valid response for this specific command.
Add also `Target.activateTarget`
2026-01-16 16:27:55 +01:00
Pierre Tachoire
d6d74c5024 first version of AXTree 2026-01-15 15:37:42 +01:00
Karl Seguin
223a6170d5 Fix use-after free
On CDP.BrowserContext.deinit, clear the isolated world ExecutionContext before
terminating the session. This is important as the isolated_world list is
allocated from the session.arena.

Also, semi-revert 63f1c85964. Before all this
we were running microtasks on ExecutionWorld.removeContext. That didn't seem
right (and I thought it was the original source of the bug). But, for the "real"
Page context, this is critical, since Microtasks can reference the Page object.
Since microTasks are isolation-level, it's possible for a microtasks for Page1
to execute after Page1 goes away (if we create a new page, Page2). This re-adds
the microtask "draining", but only for the Page (i.e. in Page.deinit).
2026-01-14 09:37:10 +08:00
Karl Seguin
6ecf52cc03 port Platform and Inspector to use v8's C handles/functions directly 2026-01-13 12:56:07 +08:00
Karl Seguin
df4e5d859f Enable blocking auth request interception 2025-12-24 12:19:11 +08:00
Karl Seguin
8215f2fd8f Merge branch 'snapshots_v2' into zigdom 2025-12-22 17:03:38 +08:00
Karl Seguin
af7f51a647 start handling page clicks and key presses 2025-12-22 17:02:20 +08:00
Karl Seguin
d9c53a3def Page.scheduleNavigation for location changes 2025-12-22 12:19:08 +08:00
Karl Seguin
f475aa09e8 backport https://github.com/lightpanda-io/browser/pull/1265 2025-12-19 16:06:25 +08:00
Pierre Tachoire
1278dc28cd cdp: add accessibility domain 2025-12-19 10:34:41 +08:00
Karl Seguin
bb1ea39c54 backport a variety of smaller CDP changes 2025-12-19 10:31:07 +08:00
Karl Seguin
b3a0aaaeea Enable v8 snapshots
There are two layers here. The first is that, on startup, a v8 SnapshotCreator
is created, and a snapshot-specific isolate/context is setup with our browser
environment. This contains most of what was in Env.init and a good chunk of
what was in ExecutionWorld.createContext. From this, we create a v8.StartupData
which is used for the creation of all subsequent contexts. The snapshot sits
at the application level, above the Env - it's re-used for all envs/isolates, so
this gives a nice performance boost for both 1 connection opening multiple pages
or multiple connections opening 1 page.

The second layer is that the Snapshot data can be embedded into the binary, so
that it doesn't have to be created on startup, but rather created at build-time.
This improves the startup time (though, I'm not really sure how to measure that
accurately...).

The first layer is the big win (and just works as-is without any build / usage
changes).

with snapshot
total runs 1000
total duration (ms) 7527
avg run duration (ms) 7
min run duration (ms) 5
max run duration (ms) 41

without snapshot
total runs 1000
total duration (ms) 9350
avg run duration (ms) 9
min run duration (ms) 8
max run duration (ms) 42

To embed a snapshot into the binary, we first need to create the snapshot file:

zig build -Doptimize=ReleaseFast snapshot_creator -- src/snapshot.bin

And then build using the new snapshot_path argument:

zig build -Dsnapshot_path=../../snapshot.bin -Doptimize=ReleaseFast

The paths are weird, I know...since it's embedded, it needs to be inside the
project path, hence we put it in src/snapshot.bin. And since it's embedded
relative to the embedder (src/browser/js/Snapshot.zig) the path has to be
relative to that, hence ../../snapshot.bin. I'm open to suggestions on
improving this.
2025-12-18 20:10:38 +08:00
Karl Seguin
9132bc2375 re-enable CDP node registry 2025-12-09 11:50:33 +08:00
Karl Seguin
61a1a2564e Fix typos
Encode unicode nonbreaking space
2025-12-05 17:48:49 +08:00
Karl Seguin
bd3da38fc8 add native custom elements 2025-11-19 22:50:52 +08:00
Karl Seguin
54a2e7650a MutationObserver and IntersectionObserver 2025-11-18 11:54:14 +08:00
Karl Seguin
d3973172e8 re-enable minimum viable CDP server 2025-10-28 18:56:03 +08:00
Pierre Tachoire
594d754022 cdp: drain microtasks before inspector deinit 2025-10-10 17:43:08 +02:00
Karl Seguin
2ba6737c41 Merge pull request #1119 from lightpanda-io/cdp_log_entry
Emit Log.addEntry
2025-10-06 16:45:48 +08:00
Karl Seguin
fe9a10c617 Emit Log.addEntry
Currently, this hooks a single log.Interceptor into the logging framework, but
changing it to take a list shouldn't be too hard. Biggest issue is who will own
it, as we'd need an allocator to maintain a list / lookup (which log doesn't
currently have).

Uses logFmt format, and, for now, always filters out debug messages and a few
particularly verbose scopes.
2025-10-03 17:29:01 +08:00
Karl Seguin
2e734fae57 This is the last of the big changes to the js code
This Pr largely tightens up a lot of the code. 'v8' is no longer imported
outside of js. A number of helper functions have been moved to the js.Context.
For example, js.Function.getName used to call:

```zig
return js.valueToString(allocator, name, self.context.isolate, self.context.v8_context);
```

It now calls:

```zig
return self.context.valueToString(name, .{ .allocator = allocator });
```

Page.main_context has been renamed to `Page.js`. This, in combination with new
promise helpers, turns:

```zig
const resolver = page.main_context.createPromiseResolver();
try resolver.resolve({});
return resolver.promise();
```

into:

```zig
return page.js.resolvePromise({});
```
2025-10-03 15:06:16 +08:00
Karl Seguin
dab8012b6a Start extract JS structs into their own files
Renames JsContext -> js.Context, JsObject -> js.Object and JsThis -> js.This
which is more consistent with the other types. The JsObject -> js.Object is
the reason so many files were touched.

This is still a [messy] transition, with more refactoring planned to clean it
up.
2025-10-02 12:48:50 +08:00
Karl Seguin
32226297ab Remove the generic nature of Env and most of the JS classes
Back in the zig-js-runtime days, globals were used for the state and webapi
declarations. This caused problems largely because it was done across
compilation units (using @import("root")...).

The generic Env(S, WebApi) was used to solve these problems, while still making
it work for different States and WebApis.

This change removes the generics and hard-codes the *Page as the state and
only supports our WebApis for the class declarations.

To accommodate this change, the runtime/*tests* have been removed. I don't
consider this a huge loss - whatever behavior these were testing, already
exists in the browser/**/*.zig web api.

As we write more complex/complete WebApis, we're seeing more and more cases
that need to rely on js objects directly (JsObject, Function, Promises, etc...).
The goal is to make these easier to use. Rather than using Env.JsObject, you
now import "js.zig" and use js.JsObject (TODO: rename JsObject to Object).
Everything is just a plain Zig struct, rather than being nested in a generic.

After this change, I plan on:

1 - Renaming the js objects, JsObject -> Object. These should be referenced in
    the webapi as js.Object, js.This, ...

2 - Splitting the code across multiple files (Env.zig, Context.zig,
    Caller.zig, ...)
2025-10-02 10:16:58 +08:00
Karl Seguin
418dc6fdc2 Start downloading all synchronous imports ASAP
This changes how non-async module loading works. In general, module loading
is triggered by a v8 callback. We ask it to process a module (a <script type=
module>) and then for every module that it depends on, we get a callback. This
callback expects the nested v8.Module instance, so we need to load it then and
there (as opposed to dynamic imports, where we only have to return a promise).

Previously, we solved this by issuing a blocking HTTP get in each callback. The
HTTP loop was able to continuing downloading already-queued resources, but if
a module depended on 20 nested modules, we'd issue 20 blocking gets one after
the other.

Once a module is compiled, we can ask v8 for a list of its dependent module. We
can them immediately start to download all of those modules. We then evaluate
the original module, which will trigger our callback. At this point, we still
need to block and wait for the response, but we've already started the download
and it's much faster. Sure, for the first module, we might need to wait the same
amount of time, but for the other 19, chances are by the time the callback
executes, we already have it downloaded and ready.
2025-09-26 15:38:50 +08:00
Karl Seguin
024f7ad9ef Merge pull request #1056 from lightpanda-io/DOM_NO_ERR
Convert more DOM_NO_ERR cases to assertions
2025-09-18 19:06:32 +08:00
Pierre Tachoire
94fe34bd10 cdp: multiple isolated worlds 2025-09-17 14:42:08 +02:00
Pierre Tachoire
04487b6b91 cdp: allow double isolated world with same world name
In this case we reuse the existing isolated world and isolated context
and we log a warning
2025-09-17 14:42:07 +02:00
Pierre Tachoire
5ea97c4910 cdp: add send error options with session id by default 2025-09-17 14:42:05 +02:00
Karl Seguin
58acb2b821 Convert more DOM_NO_ERR cases to assertions
There is some risk to this change. The first is that I made a mistake. The
other is that one of the APIs that doesn't currently return an error changes
in the future.
2025-09-17 13:37:48 +08:00
Karl Seguin
dd22c55d23 migrate to htmlRunne (plus zig fmt) 2025-09-05 13:52:08 +08:00
Karl Seguin
5dda86bf4a Emit networkIdle and networkAlmostIdle Page.lifecycleEvent
Most CDP drivers have a mechanism to wait for idle network, or an almost idle
network (sometimes called networkIdle2). These are events the browser must emit.

The page will now emit `networkIdle` when we are reasonably sure there's no more
network activity (this requires some slight changes to request interception,
since, I believe, intercepted requests should be considered).

`networkAlmostIdle` is currently _always_ emitted prior to emitting
`networkIdle`. We should tweak this but I can't, at a glance, think of a great
heuristic for when this should be emitted.
2025-09-04 16:36:29 +08:00
Karl Seguin
b6137b03cd Rework page wait again
Further reducing bouncing between page and server for loop polling. If there is
a page, the page polls. If there isn't a page, the server polls. Simpler.
2025-09-03 19:38:01 +08:00
Karl Seguin
e237e709b6 Change loader id on navigation
This appears to be what chrome is doing. I don't know why we weren't before.
2025-09-03 08:17:14 +08:00
Karl Seguin
2ac9b2088a Always monitor the CDP client socket, even on page.wait 2025-09-03 08:17:13 +08:00
Karl Seguin
1443f38e5f Zig 0.15.1
Depends on https://github.com/lightpanda-io/zig-v8-fork/pull/89
2025-08-29 10:42:06 +08:00
Pierre Tachoire
7647ce9e6d Merge pull request #960 from lightpanda-io/auth-challenge
Some checks failed
e2e-test / zig build release (push) Has been cancelled
e2e-test / demo-scripts (push) Has been cancelled
e2e-test / cdp-and-hyperfine-bench (push) Has been cancelled
e2e-test / perf-fmt (push) Has been cancelled
zig-test / zig build dev (push) Has been cancelled
zig-test / browser fetch (push) Has been cancelled
zig-test / zig test (push) Has been cancelled
zig-test / perf-fmt (push) Has been cancelled
auth required interception
2025-08-27 15:34:51 +02:00
Pierre Tachoire
041e014d68 Merge pull request #970 from lightpanda-io/remove_loop
Remove the loop
2025-08-26 18:17:32 +02:00
Pierre Tachoire
6b47aa2446 cdp: add auth required interception process 2025-08-26 18:05:44 +02:00
sjorsdonkers
0ad09cca9d Fix sendError message's format 2025-08-25 12:51:47 +02:00
Karl Seguin
0959eea677 Remove the loop
Previously, the IO loop was doing three things:
1 - Managing timeouts (either from scripts or for our own needs)
2 - Handling browser IO events (page/script/xhr)
3 - Handling CDP events (accept, read, write, timeout)

With the libcurl merge, 1 was moved to an in-process scheduler and 2 was moved
to libcurl's own event loop. That means the entire loop code, including
the dependency on tigerbeetle-io existed for handling a single TCP client.
Not only is that a lot of code, there was also friction between the two loops
(the libcurl one and our IO loop), which would result in latency - while one
loop is waiting for the events, any events on the other loop go un-processed.

This PR removes our IO loop. To accomplish this:

1 - The main accept loop is blocking. This is simpler and works perfectly well,
given we only allow 1 active connection.
2 - The client socket is passed to libcurl - yes, libcurl's loop can take
arbitrary FDs and poll them along with its own.

In addition to having one less dependency, the CDP code is quite a bit simpler,
especially around shutdowns and writes. This also removes _some_ of the latency
caused by the friction between page process and CDP processing. Specifically,
when CDP now blocks for input, http page events (script loading, xhr, ...) will
still be processed.

There's still friction. For one, the reverse isn't true: when the page is
waiting for events, CDP events aren't going to be processed. But the page.wait
already have some sensitivity to this (e.g. the page.request_intercepted flag).
Also, when CDP waits, while we will process network events, page timeouts are
still not processed. Because of both these remaining issues, we still need to
jump between the two loops - but being able to block on CDP (even for a short
time) WITHOUT stopping the page's network I/O, should reduce some latency.
2025-08-25 17:27:28 +08:00
Karl Seguin
cd33e9ad0e Implement Network.getResponseBody
Add response_data event, CDP now captures the full body so that it can respond
to the Network.getResponseBody. This isn't memory efficient, but I don't see
another way to do it. At least this way, it's only capturing/storing every
response body when (a) CDP is used and (b) Network.enabled is called. That is,
as opposed to baking this into Http/Client.zig, which would force the memory
consumption for all use-cases.

There's arguably some optimizations we could make for XHR requests, which also
dupe/own the response. As of now, the response is dupe'd separately for CDP
and XHR.
2025-08-21 10:33:53 +08:00
Karl Seguin
6b001c50a4 Emits a http_request_done internal notification.
With networking enabled, CDP listens to this event and emits a
`Network.loadingFinished` event. This is event is used by puppeteer to know that
details about the response (i.e. the body) can be queries.

Added dummy handling for the Network.getResponseBody message. Returns an
empty body. Needed because we emit the loadingFinished event which signals
to drivers that they can ask for the body.
2025-08-20 19:32:19 +08:00
Karl Seguin
f5ec74252d Add fulfillRequest and more complete continueRequest 2025-08-18 18:29:10 +08:00
Karl Seguin
211012d367 move intercept_state and extra_headers from CDP instance to BrowserContext 2025-08-18 13:23:17 +08:00
Karl Seguin
01223601f2 Reduce allocations made during request interception
Stream (to json) the Transfer as a request and response object in the various
network interception-related events (e.g. Network.responseReceived).

Add a page.request_intercepted boolean flag for CDP to signal the page that
requests have been intercepted, allowing Page.wait to prioritize intercept
handling (or, at least, not block it).
2025-08-15 14:01:57 +08:00
Karl Seguin
96b10f4b85 Optimize Network.responseReceived
Add a header iterator to the transfer. This removes the need for NetworkState,
duping header name/values, and the http_header_received event.
2025-08-14 15:50:56 +08:00
sjorsdonkers
7d05712f40 setExtraHTTPHeaders 2025-08-13 14:54:59 +02:00
sjorsdonkers
c0106a238b http_headers_done_receiving 2025-08-13 14:29:23 +02:00