Playwright, when creating a new CDPSession, sends an
attachToBrowserTarget followed by another attachToTarget to re-attach
itself to the existing target.
see playwright/axtree.js from demo/ repository.
This field was recently added and is used to generate correct frameIds in CDP
messages. They remain the same during a navigation event, so calling them
page.id might cause surprises since navigation events create new pages, but
retain the original id. Hence, frame_id is more accurate and hopefully less
surprising.
(This is a small cleanup prior to doing some iframe navigation work).
Follow up to https://github.com/lightpanda-io/browser/pull/1646
The encodeURL (renamed to ensureEncoded and exposed in this commit) already
handled already-encoded URLs, so this was largely a matter of exposing the
functionality.
The reason this isn't baked directly into Page.navigate is that, in some places
e.g. internal navigation, the URL is already know to be encoded. So it's up
to every caller to make sure they are passing a valid URL to navigate.
Our BrowsingContext currently supports 1 target. So we have a per-BC target_id.
Previously, our target had 1 "frame" - our page. So we often treated the
targetId as the frameId. But to work with frames, we need page-specific
frameIds and loaderIds.
This tries to clean up our ids (a little). frameIds are now ids derived from
a new incrementing page.id. This page.id has to be passed around (via http
Requests and through notifications) in order to properly generate messages with
a frameId.
The BrowserContext currently uses 3 arenas:
1 - Command-specific, which is like the call_arena, but for the processing of a
single CDP command
2 - Notification-specific, which is similar, but for the processing of a single
internal notification event
3 - Arena, which is just the session arena and lives for the duration of the
BrowseContext/Session
This is pretty coarse and can results in significant memory accumulation if a
browser context is re-used for multiple navigations.
This commit introduces 3 changes:
1 - Rather than referencing Session.arena, the BrowerContext.arena is now its
own arena. This doesn't really change anything, but it does help keep things
a bit better separated.
2 - Introduces a page_arena (not to be confused with Page.arena). This arena
exists for the duration of a 1 page, i.e. it's cleared when the
BrowserContext receives the page_created internal notification. The
`captured_responses` now uses this arena, which means captures only exist
for the duration of the current page. This appears to be consistent with
how chrome behaves (In fact, Chrome seems even more aggressive and doesn't
appear to make any guarantees around captured responses). CDP refers to this
lifetime as a "renderer" and has an experimental message, which we don't
support, `Network.configureDurableMessages` to control this.
3 - Isolated Worlds are now more self contained with an arena from the ArenaPool.
There are currently 2 places where the BrowserContext.arena is still used:
1 - the isolated_world list
2 - the custom headers
Although this could be long lived, I believe the above is ok. We should just
really think twice whenever we want to use it for anything else.
V8's inspector world is made up of 4 components: Inspector, Client, Channel and
Session. Currently, we treat all 4 components as a single unit which is tied to
the lifetime of CDP BrowserContext - or, loosely speaking, 1 "Inspector Unit"
per page / v8::Context.
According to https://web.archive.org/web/20210622022956/https://hyperandroid.com/2020/02/12/v8-inspector-from-an-embedder-standpoint/
and conversation with Gemini, it's more typical to have 1 inspector per isolate.
The general breakdown is the Inspector is the top-level manager, the Client is
our implementation which control how the Inspector works (its function we expose
that v8 calls into). These should be tied to the Isolate. Channels and Sessions
are more closely tied to Context, where the Channel is v8->zig and the Session
us zig->v8.
This PR does a few things
1 - It creates 1 Inspector and Client per Isolate (Env.js)
2 - It creates 1 Session/Channel per BrowserContext
3 - It merges v8::Session and v8::Channel into Inspector.Session
4 - It moves the Inspector instance directly into the Env
5 - BrowserContext interacts with the Inspector.Session, not the Inspector
4 is arguably unnecessary with respect to the main goal of this commit, but
the end-goal is to tighten the integration. Specifically, rather than CDP having
to inform the inspector that a context was created/destroyed, the Env which
manages Contexts directly (https://github.com/lightpanda-io/browser/pull/1432)
and which now has direct access to the Inspector, is now equipped to keep this
in sync.
Deciding what should be an lp.assert, vs an std.debug.assert, vs a debug-only
assert is a little arbitrary.
debug-only asserts, guarded with an `if (comptime IS_DEBUG)` obviously avoid the
check in release and thus have a performance advantage. We also use them at
library boundaries. If libcurl says it will always emit a header line with a
trailing \r\n, is that really a check we need to do in production? I don't think
so. First, that code path is checked _a lot_ in debug. Second, it feels a bit
like we're testing libcurl (in production!)..why? A debug-only assertion should
be good enough to catch any changes in libcurl.
When we create a js.Context, we create the underlying v8.Context and store it
for the duration of the page lifetime. This works because we have a global
HandleScope - the v8.Context (which is really a v8::Local<v8::Context>) is that
to the global HandleScope, effectively making it a global.
If we want to remove our global HandleScope, then we can no longer pin the
v8.Context in our js.Context. Our js.Context now only holds a v8.Global of the
v8.Context (v8::Global<v8::Context).
This PR introduces a new type, js.Local, which takes over a lot of the
functionality previously found in either js.Caller or js.Context. The simplest
way to think about it is:
1 - For v8 -> zig calls, we create a js.Caller (as always)
2 - For zig -> v8 calls, we go through the js.Context (as always)
3 - The shared functionality, which works on a v8.Context, now belongs to js.Local
For #1 (v8 -> zig), creating a js.Local for a js.Caller is really simple and
centralized. v8 largely gives us everything we need from the
FunctionCallbackInfo or PropertyCallbackInfo. For #2, it's messier, because we
can only create a local v8::Context if we have a HandleScope, which we may or
may not.
Unfortunately, in many cases, what to do becomes the responsibility of the caller
and much of the code has to become aware of this local-ness. What does it means
for our code? The impact is on WebAPIs that store .Global. Because the global
can't do anything. You always need to convert that .Global to a local
(e.g. js.Function.Global -> js.Function).
If you're 100% sure the WebAPI is only being invoked by a v8 callback, you can
use `page.js.local.?.toLocal(some_global).call(...)` to get the local value.
If you're 100% sure the WebAPI is only being invoked by Zig, you need to create
`js.Local.Scope` to get access to a local:
```zig
var ls: js.Local.Scope = undefined;
page.js.localScope(&ls);
defer ls.deinit();
ls.toLocal(some_global).call(...)
// can also access `&ls.local` for APIs that require a *const js.Local
```
For functions that can be invoked by either V8 or Zig, you should generally push
the responsibility to the caller by accepting a `local: *const js.Local`. If the
caller is a v8 callback, it can pass `page.js.local.?`. If the caller is a Zig
callback, it can create a `Local.Scope`.
As an alternative, it is possible to simply pass the *Page, and check
`if page.js.local == null` and, if so, create a Local.Scope. But this should only
be done for performance reasons. We currently only do this in 1 place, and it's
because the Zig caller doesn't know whether a Local will actually be needed and
it's potentially called on every element creating from the parser.
This Pr largely tightens up a lot of the code. 'v8' is no longer imported
outside of js. A number of helper functions have been moved to the js.Context.
For example, js.Function.getName used to call:
```zig
return js.valueToString(allocator, name, self.context.isolate, self.context.v8_context);
```
It now calls:
```zig
return self.context.valueToString(name, .{ .allocator = allocator });
```
Page.main_context has been renamed to `Page.js`. This, in combination with new
promise helpers, turns:
```zig
const resolver = page.main_context.createPromiseResolver();
try resolver.resolve({});
return resolver.promise();
```
into:
```zig
return page.js.resolvePromise({});
```
Stream (to json) the Transfer as a request and response object in the various
network interception-related events (e.g. Network.responseReceived).
Add a page.request_intercepted boolean flag for CDP to signal the page that
requests have been intercepted, allowing Page.wait to prioritize intercept
handling (or, at least, not block it).
The mix of sync and async HTTP requests requires care to avoid deadlocks.
Previously, it was possible for async requests to use up all available HTTP
state objects duration a navigation flow (either directly, or via an internal
redirect (e.g. click, submit, ...)). This would block the navigation, which,
because everything is single thread, would block the I/O loop, resulting in a
deadlock.
The correct solution seems to be to remove all synchronous I/O. And I tried to
do that, but I ran into a wall with module-loading, which is initiated from V8.
V8 says "give me the source for this module", and I don't see a great way to
tell it: wait a bit.
So I went back to trying to make this work with the hybrid model, despite last
weeks failures to get it to work. I changed two things:
1 - The http client will only directly initiate an async request if there's
at least 2 free state objects available (1 for the request, and leaving 1
free for any synchronous requests)
2 - Delayed navigation retries until there's at least 1 free http state object
available.
Commits from last week did help with this. First, we're now guaranteed to have
a single sync-request at a time (previously, we could have had 2). Secondly,
the async connection is now async end-to-end (previously, it could have blocked
on an empty state pool).
We could probably make this a bit more obviously by reserving 1 state object
for synchronous requests. But, since the long term solution is probably having
no synchronous requests, I'm happy with anything that lets me move past this
issue.
Outputs in logfmt in release and a "pretty" print in debug mode. The format
along with the log level will become arguments to the binary at some point in
the future.
emit contextCreated when it's needed, not when it actually happens.
I thought we could make this sync-up, but we'd need to create 3 contexts to
satisfy both puppeteer and chromedp. So rather than having it partially
driven by notifications from Browser, I rather just fake it all for now.
- Pages within the same session have proper isolation
- they have their own window
- they have their own SessionState
- they have their own v8.Context
- Move inspector to CDP browser context
- Browser now knows nothing about the inspector
- Use notification to emit a context-created message
- This is still a bit hacky, but again, it decouples browser from CDP
It's still generic over the client - we need to assert messages written to and
be able to send specific commands, but it's no longer generic over Browser/
Session/Page/etc..
This expands on the existing CDP node work used in DOM.search. It introduces
a node registry to track all nodes returned to the client and give lookups to
get a node from a Id or a *parser.node.
Eventually, the goal is to have the Registry emit the DOM.setChildNodes event
whenever necessary, as well as support many of the missing DOM actions.
Added tests to existing search handlers. Reworked search a little bit to avoid
some unnecessary allocations and to hook it into the registry.
The generated Node is currently incomplete. The parentId is missing, the
children are missing. Also, we still need to associate the v8 ObjectId to the
node.
Finally, I moved all action handlers into a nested "domain" folder.