When inspector emits a message, to be sent to the client, we copy those bytes a
number of times. First, V8 serializes the message to CBOR. Next, it converts it
to JSON. We then copy this into a C++ string, then into a Zig slice. We create
one final copy (with websocket framing) to add to the write queue.
Something similar, but a little less extreme, happens with incoming messages.
By supporting CBOR messages directly, we not only reduce the amount of copying,
but also leverage our [more tightly scoped and re-used] arenas.
CBOR is essentially a standardized MessagePack. Two functions, jsonToCbor and
cborToJson have been introduced to take our incoming JSON message and convert it
to CBOR and, vice-versa. V8 automatically detects that the message is CBOR and,
if the incoming message is CBOR, the outgoing message is CBOR also.
While v8 is spec-compliant, it has specific expectations and behavior. For
example, it never emits a fixed-length array / map - it's always an infinite
array / map (with a special "break" code at the end). For this reason, our
implementation is not complete, but rather designed to work with what v8 does
and expects.
Another example of this is, and I don't understand why, some of the
incoming messages have a "params" field. V8 requires this to be a CBOR embedded
data field (that is, CBOR embedded into CBOR). If we pass an array directly,
while semantically the same, it'll fail. I guess this is how Chrome serializes
the data, and rather than just reading the data as-is, v8 asserts that it's
encoded in a particularly flavor. Weird. But we have to accommodate that.
We currently keep the main request open during loadHTMLDoc and processHTMLDoc.
It _has_ to be open during loadHTMLDoc, since that streams the body. But it
does not have to be open during processHTMLDoc, which can be log and itself
could make use of that same connection if it was released. Reorganized the
navigate flow to limit the scope of the request.
Also, just like we track pending_write and pending_read, we now also track
pending_connect and only shutdown when all are not pending.
The HTTP Client has a state pool. It blocks when we've exceeded max_concurrency.
This can block processing forever. A simple way to reproduce this is to go into
the demo cdp.js, and execute the XHR request 5 times (loading json/product.json)
To some degree, I think this is a result of weird / non-intuitive execution
flow. If you exec a JS with 100 XHR requests, it'll call our XHR _send function
but none of these will execute until the loop is run (after the script is done
being executed). This can result in poor utilization of our connection and
state pool.
For an async request, getting the *Request object is itself now asynchronous.
If no state is available, we use the Loop's timeout (at 20ms) to keep checking
for an available state.
Async HTTP request work by emitting a "Progress" object to a callback. This
object has a "done" flag which, when `true`, indicates that all data has been
emitting and no future "Progress" objects will be sent.
Callers like XHR buffer the response and wait for "done = true" to then process
the request.
The HTTP client relies on two important object pools: the connection and the
state (with all the buffers for reading/writing).
In its current implementation, the async flow does not release these pooled
objects until the final callback has returned. At best, this is inefficient:
we're keeping the connection and state objects checked out for longer than they
have to be. At worse, it can lead to a deadlock. If the calling code issues a
new request when done == true, we'll eventually run out of state objects in the
pool.
This commit now releases the state objects before emit the final "done" Progress
message. For this to work, this final message will always have null data and
an empty header object.
This is often called in a tight loop (the callback to requestAnimation typically
calls requestAnimation).
Instead, we can treat it like a setTimeout with a short delay (5ms ?). This has
the added benefit of making it cancelable, FWIW.
Async HTTP request work by emitting a "Progress" object to a callback. This
object has a "done" flag which, when `true`, indicates that all data has been
emitting and no future "Progress" objects will be sent.
Callers like XHR buffer the response and wait for "done = true" to then process
the request.
The HTTP client relies on two important object pools: the connection and the
state (with all the buffers for reading/writing).
In its current implementation, the async flow does not release these pooled
objects until the final callback has returned. At best, this is inefficient:
we're keeping the connection and state objects checked out for longer than they
have to be. At worse, it can lead to a deadlock. If the calling code issues a
new request when done == true, we'll eventually run out of state objects in the
pool.
This commit now releases the state objects before emit the final "done" Progress
message. For this to work, this final message will always have null data and
an empty header object.
This adds support for:
```
new URLSearchParams({over: 9000});
```
The spec says that any thing that produces/iterates a sequence of string pairs
is valid. By using the lower-level JsObject, this hopefully takes care of the
most common cases. But I don't think it's complete, and I don't think we
currently capture enough data to make this work. There's no way for the JS
runtime to know if a value (say, a netsurf instance, or even a Zig instance)
provides an string=>string iterator.
The NodeWrapper pattern attaches a Zig instance to a libdom Node. That works in
isolation, but for 1 given node, we might want to attach different instances.
For example, for an HTMLScriptElement we want to attach an `onError`, but for
that same node viewed as an HTMLElement we want to a `CSSStyleDeclaration`. We
can only have one. Currently, this code will crash if, for example, we create
the embedded data as an HTMLScriptElement, then try to read the embedded data
as an HTMLElement.
This PR introduces dedicated state class. So if you want the onError property,
you no longer ask the NodeWrapper for an HTMLSCriptElement. Instead, you ask
for a storage/HTMLElement.
Nothing fancy here, just memory-inefficient optional fields. If it gets out of
hand, we'll think of something more clever.
On page load, emitted by the page, the target is the window, but it's improperly
cast since the pointer is actually `window.base`. This is going to be a problem
in general for any Zig type dispatched as a target, but the Window one is the
most obvious and the easiest to fix. If this issue comes up with other types,
we'll need to come up with a more robust solution.
Automatically include the stack trace in a `console.error` output. This is
useful because code frequently does:
```
try blah();
catch (e) console.log(e);
```
Which we log, but, without this, don't get the stack.
Extracts the FormData logic, which is both more complete and more correct and
reuses it between FormData and URLSearchParams.
This includes the additional iterator behavior, `set` and URLSearchParams
constructor from FormData.
If you look at the specification for `console` [1], you'll note that it's a
namespace, not an interface (like most things). Furthermore, MDN lists its
methods as "static".
But it's a pretty weird namespace IMO, because some of its "functions", like
`count` can have state associated with them.
This causes some problems with our current implementation. Something like:
```
[1].forEach(console.log)
```
Fails, since `this` isn't our window-attached Console instance.
This commit introducing a new `static_XYZ` naming convention which does not
have the class/Self as a receiver:
```
pub fn static_log(values: []JsObject, page: *Page) !void {
```
This turns Console into a namespace for these specific functions, while still
being used normally for those functions that require state.
We could infer this behavior from the first parameter, but that seems more
error prone. For now, I prefer having the explicit `static_` prefix.
[1] https://console.spec.whatwg.org/#console-namespace