1 - Embed Page into Session, avoids having to allocate/deallocate the page
2 - Fix false-positive arena pool leak detection
3 - On form submit, pre-check if navigation will allowed before building the
contents
4 - Make sure QueuedNavigation structure isn't use after page is removed
The BrowserContext currently uses 3 arenas:
1 - Command-specific, which is like the call_arena, but for the processing of a
single CDP command
2 - Notification-specific, which is similar, but for the processing of a single
internal notification event
3 - Arena, which is just the session arena and lives for the duration of the
BrowseContext/Session
This is pretty coarse and can results in significant memory accumulation if a
browser context is re-used for multiple navigations.
This commit introduces 3 changes:
1 - Rather than referencing Session.arena, the BrowerContext.arena is now its
own arena. This doesn't really change anything, but it does help keep things
a bit better separated.
2 - Introduces a page_arena (not to be confused with Page.arena). This arena
exists for the duration of a 1 page, i.e. it's cleared when the
BrowserContext receives the page_created internal notification. The
`captured_responses` now uses this arena, which means captures only exist
for the duration of the current page. This appears to be consistent with
how chrome behaves (In fact, Chrome seems even more aggressive and doesn't
appear to make any guarantees around captured responses). CDP refers to this
lifetime as a "renderer" and has an experimental message, which we don't
support, `Network.configureDurableMessages` to control this.
3 - Isolated Worlds are now more self contained with an arena from the ArenaPool.
There are currently 2 places where the BrowserContext.arena is still used:
1 - the isolated_world list
2 - the custom headers
Although this could be long lived, I believe the above is ok. We should just
really think twice whenever we want to use it for anything else.
page.js currently always references the page context. But through the inspector
JavaScript can be executed in different contexts. When we go from V8->Zig we
correctly capture the current context within the caller's Local. And, because of
this, mapping or anything else that happens against local.ctx, happens in the
right context.
EXCEPT...our code still accesses page.js. So you can have a v8->zig call
happening in Context-2, and our Zig call then tries to do something on Context-1
via page.js.
I'm introducing a change that updates page.js based on the current Caller and
restores it at the end of the Caller. This change is super small, but
potentially has major impact. It's hard to imagine that we haven't run into
problems with this before, and it's hard to imagine what problems this change
might introduce. Certainly, if anyone copies page.js, they'll be in for a rude
surprise, but i don't think we do that anywhere.
==Fix 1==
The problem flow:
1 - The module is dynamically imported, this creates a cache entry with no
module and no module_promise, and starts an async fetch.
2 - Before dynamicModuleSourceCallback fires (from step 1 above), the same
module is imported as a child of a call graph, i.e. via resolveModuleCallback.
Here the module is compiled, but never evaluated (we only evaluate the root
module). This is where things start to go sour. Our cache entry now has a
module, but no module_promise.
3 - The async fetch completes and calls dynamicModuleSourceCallback which call
Context.module. This returns early (the module is already cached thanks to
step 2). But it then calls resolveDynamicModule which (a) has a module and (b)
no module_promise.
Our fix works because, if Context.module finds the cached module (from step 2),
it now also checks for the module_promise. If it doesn't find it, it evaluates
the module (which sets it).
I've since expanded the code to handle more intermediary states.
The original PR had:
if (gop.value_ptr.module_promise == null) {
const mod = local.toLocal(cache_mod);
if (mod.getStatus() == .kInstantiated) {
return self.evaluateModule(want_result, mod, url, true);
}
}
But now the code is:
if (gop.value_ptr.module_promise == null) {
const mod = local.toLocal(cache_mod);
if (mod.getStatus() == .kUninstantiated and try mod.instantiate(resolveModuleCallback) == false) {
return error.ModuleInstantiationError;
}
return self.evaluateModule(want_result, mod, url, true);
}
It seems that v8 handles double-instantiation and double-evaluations safely.
Handle more partial-load states.
Handle more partial-load states + fix possible dangling pointer.
==Fix 2==
We were using `gop` after potentially writing to the map (via a nested call to
mod.evaluate()). We now re-fetch the map entry to be able to safely write to it
Currently, when loading a module, if the module is found in the cache, it's
immediately returned. However, that can result in a timing issue where the
module is cached, but not evaluated, and doesn't have an associated promise.
This commit tries to ensure a module is always evaluated and that the cache
entry has a module promise.
This might fix an crash handler issue. I couldn't reproduce the issue though.
I believe it requires specific timing which is hard to reproduce in a test.
We previously figured that we could release the XHR object as soon as the JS
reference was out of scope. But the callbacks could still exist and thus the
XHR request should proceed.
This commit ensures the XHR instance remains valid so long as we have an active
request.
Might help with https://github.com/lightpanda-io/browser/issues/1448 but I can't
reliably reproduce this, so I'm not 100% sure it resolve the issue. That bug
appears to be caused by some timing interaction between the underlying HTTP
request and the v8 GC.
There's no guarantee that a task will ever be run. A page can be shutdown by
the user or timeout or an error. Scheduler cleanup relies on the underlying
page.arena. This forces all tasks to rely on the page.arena as they have no way
to clean themselves.
This commit allows tasks to register a finalizer which is guaranteed to be
called when the scheduler is shutdown.
The window ScheduleCallback, PostMessageCallback now use an arena from the
ArenaPool rather than the page.arena and use the task finalizer to ensure the
arena is released on shutdown.
V8's inspector world is made up of 4 components: Inspector, Client, Channel and
Session. Currently, we treat all 4 components as a single unit which is tied to
the lifetime of CDP BrowserContext - or, loosely speaking, 1 "Inspector Unit"
per page / v8::Context.
According to https://web.archive.org/web/20210622022956/https://hyperandroid.com/2020/02/12/v8-inspector-from-an-embedder-standpoint/
and conversation with Gemini, it's more typical to have 1 inspector per isolate.
The general breakdown is the Inspector is the top-level manager, the Client is
our implementation which control how the Inspector works (its function we expose
that v8 calls into). These should be tied to the Isolate. Channels and Sessions
are more closely tied to Context, where the Channel is v8->zig and the Session
us zig->v8.
This PR does a few things
1 - It creates 1 Inspector and Client per Isolate (Env.js)
2 - It creates 1 Session/Channel per BrowserContext
3 - It merges v8::Session and v8::Channel into Inspector.Session
4 - It moves the Inspector instance directly into the Env
5 - BrowserContext interacts with the Inspector.Session, not the Inspector
4 is arguably unnecessary with respect to the main goal of this commit, but
the end-goal is to tighten the integration. Specifically, rather than CDP having
to inform the inspector that a context was created/destroyed, the Env which
manages Contexts directly (https://github.com/lightpanda-io/browser/pull/1432)
and which now has direct access to the Inspector, is now equipped to keep this
in sync.