In https://github.com/lightpanda-io/browser/pull/767 I tried to call loop.run
from within a loop.run (spoiler, it didn't work), in order to make sure aborted
connections were properly cleaned up before starting a new navigation. That
resulted in having loop.run no longer wait for timeouts for fear of having to
wait on a long timeout. The ended up breaking page.wait (used in the fetch
command).
This commit brings back the original behavior where loop.run() waits for all
completions. Which is now safe to do since the nested loop.run() call has
been removed.
This is one of the ways that puppeteer knows that navigation happened
and is needed to support `waitForNavigation` which compares the
existing loader-id with the new one, so it has to change.
Also, fix a crash that could happen if CDP disconnects while
connections are being aborted.
Currently, if there's an internal navigation event, we continue to process the
page normally. This has negative performance implication, and can result in
user-scripts producing unexpected results.
For example, imagine a page with a script that does.
```js
if (x) {
form.submit();
}
reloadProduct();
```
The call to `form.submit()` should stop the script from executing. And, if the
page has 10 other <script> tags after this, they shouldn't be loaded nor
executed.
This code terminates the execution of the current script on an internal
navigation event, and stops the rest of the page from load.
While I believe this creates a more "correct" behavior, it also introduces new
edge cases. There's no a period of time, between the termination being stopped
and then being resumed, where executing code is not safe.
The mix of sync and async HTTP requests requires care to avoid deadlocks.
Previously, it was possible for async requests to use up all available HTTP
state objects duration a navigation flow (either directly, or via an internal
redirect (e.g. click, submit, ...)). This would block the navigation, which,
because everything is single thread, would block the I/O loop, resulting in a
deadlock.
The correct solution seems to be to remove all synchronous I/O. And I tried to
do that, but I ran into a wall with module-loading, which is initiated from V8.
V8 says "give me the source for this module", and I don't see a great way to
tell it: wait a bit.
So I went back to trying to make this work with the hybrid model, despite last
weeks failures to get it to work. I changed two things:
1 - The http client will only directly initiate an async request if there's
at least 2 free state objects available (1 for the request, and leaving 1
free for any synchronous requests)
2 - Delayed navigation retries until there's at least 1 free http state object
available.
Commits from last week did help with this. First, we're now guaranteed to have
a single sync-request at a time (previously, we could have had 2). Secondly,
the async connection is now async end-to-end (previously, it could have blocked
on an empty state pool).
We could probably make this a bit more obviously by reserving 1 state object
for synchronous requests. But, since the long term solution is probably having
no synchronous requests, I'm happy with anything that lets me move past this
issue.
Support CDP's Input.dispatchKeyEvent and DOM key events. Currently only
keydown is supported and expects every key to be a displayable character.
It turns out that manipulating the DOM via key events isn't great because the
behavior really depends on the cursor. So, to do this more accurately, we'd
have to introduce some concept of a cursor.
Personally, I don't think we'll run into many pages that are purposefully
using keyboard events. But driver (puppeteer/playwright) scripts might be
another issue.
CDP translate this into a Network.loadingFailed. This is necessary to make sure
every Network.requestWillBeSent is paired with either a Network.loadingFailed
or a Network.responseReceived.
We currently keep the main request open during loadHTMLDoc and processHTMLDoc.
It _has_ to be open during loadHTMLDoc, since that streams the body. But it
does not have to be open during processHTMLDoc, which can be log and itself
could make use of that same connection if it was released. Reorganized the
navigate flow to limit the scope of the request.
Also, just like we track pending_write and pending_read, we now also track
pending_connect and only shutdown when all are not pending.
The HTTP Client has a state pool. It blocks when we've exceeded max_concurrency.
This can block processing forever. A simple way to reproduce this is to go into
the demo cdp.js, and execute the XHR request 5 times (loading json/product.json)
To some degree, I think this is a result of weird / non-intuitive execution
flow. If you exec a JS with 100 XHR requests, it'll call our XHR _send function
but none of these will execute until the loop is run (after the script is done
being executed). This can result in poor utilization of our connection and
state pool.
For an async request, getting the *Request object is itself now asynchronous.
If no state is available, we use the Loop's timeout (at 20ms) to keep checking
for an available state.
Async HTTP request work by emitting a "Progress" object to a callback. This
object has a "done" flag which, when `true`, indicates that all data has been
emitting and no future "Progress" objects will be sent.
Callers like XHR buffer the response and wait for "done = true" to then process
the request.
The HTTP client relies on two important object pools: the connection and the
state (with all the buffers for reading/writing).
In its current implementation, the async flow does not release these pooled
objects until the final callback has returned. At best, this is inefficient:
we're keeping the connection and state objects checked out for longer than they
have to be. At worse, it can lead to a deadlock. If the calling code issues a
new request when done == true, we'll eventually run out of state objects in the
pool.
This commit now releases the state objects before emit the final "done" Progress
message. For this to work, this final message will always have null data and
an empty header object.
This is often called in a tight loop (the callback to requestAnimation typically
calls requestAnimation).
Instead, we can treat it like a setTimeout with a short delay (5ms ?). This has
the added benefit of making it cancelable, FWIW.
Async HTTP request work by emitting a "Progress" object to a callback. This
object has a "done" flag which, when `true`, indicates that all data has been
emitting and no future "Progress" objects will be sent.
Callers like XHR buffer the response and wait for "done = true" to then process
the request.
The HTTP client relies on two important object pools: the connection and the
state (with all the buffers for reading/writing).
In its current implementation, the async flow does not release these pooled
objects until the final callback has returned. At best, this is inefficient:
we're keeping the connection and state objects checked out for longer than they
have to be. At worse, it can lead to a deadlock. If the calling code issues a
new request when done == true, we'll eventually run out of state objects in the
pool.
This commit now releases the state objects before emit the final "done" Progress
message. For this to work, this final message will always have null data and
an empty header object.
This adds support for:
```
new URLSearchParams({over: 9000});
```
The spec says that any thing that produces/iterates a sequence of string pairs
is valid. By using the lower-level JsObject, this hopefully takes care of the
most common cases. But I don't think it's complete, and I don't think we
currently capture enough data to make this work. There's no way for the JS
runtime to know if a value (say, a netsurf instance, or even a Zig instance)
provides an string=>string iterator.
The NodeWrapper pattern attaches a Zig instance to a libdom Node. That works in
isolation, but for 1 given node, we might want to attach different instances.
For example, for an HTMLScriptElement we want to attach an `onError`, but for
that same node viewed as an HTMLElement we want to a `CSSStyleDeclaration`. We
can only have one. Currently, this code will crash if, for example, we create
the embedded data as an HTMLScriptElement, then try to read the embedded data
as an HTMLElement.
This PR introduces dedicated state class. So if you want the onError property,
you no longer ask the NodeWrapper for an HTMLSCriptElement. Instead, you ask
for a storage/HTMLElement.
Nothing fancy here, just memory-inefficient optional fields. If it gets out of
hand, we'll think of something more clever.