Currently, this hooks a single log.Interceptor into the logging framework, but
changing it to take a list shouldn't be too hard. Biggest issue is who will own
it, as we'd need an allocator to maintain a list / lookup (which log doesn't
currently have).
Uses logFmt format, and, for now, always filters out debug messages and a few
particularly verbose scopes.
This Pr largely tightens up a lot of the code. 'v8' is no longer imported
outside of js. A number of helper functions have been moved to the js.Context.
For example, js.Function.getName used to call:
```zig
return js.valueToString(allocator, name, self.context.isolate, self.context.v8_context);
```
It now calls:
```zig
return self.context.valueToString(name, .{ .allocator = allocator });
```
Page.main_context has been renamed to `Page.js`. This, in combination with new
promise helpers, turns:
```zig
const resolver = page.main_context.createPromiseResolver();
try resolver.resolve({});
return resolver.promise();
```
into:
```zig
return page.js.resolvePromise({});
```
Renames JsContext -> js.Context, JsObject -> js.Object and JsThis -> js.This
which is more consistent with the other types. The JsObject -> js.Object is
the reason so many files were touched.
This is still a [messy] transition, with more refactoring planned to clean it
up.
Back in the zig-js-runtime days, globals were used for the state and webapi
declarations. This caused problems largely because it was done across
compilation units (using @import("root")...).
The generic Env(S, WebApi) was used to solve these problems, while still making
it work for different States and WebApis.
This change removes the generics and hard-codes the *Page as the state and
only supports our WebApis for the class declarations.
To accommodate this change, the runtime/*tests* have been removed. I don't
consider this a huge loss - whatever behavior these were testing, already
exists in the browser/**/*.zig web api.
As we write more complex/complete WebApis, we're seeing more and more cases
that need to rely on js objects directly (JsObject, Function, Promises, etc...).
The goal is to make these easier to use. Rather than using Env.JsObject, you
now import "js.zig" and use js.JsObject (TODO: rename JsObject to Object).
Everything is just a plain Zig struct, rather than being nested in a generic.
After this change, I plan on:
1 - Renaming the js objects, JsObject -> Object. These should be referenced in
the webapi as js.Object, js.This, ...
2 - Splitting the code across multiple files (Env.zig, Context.zig,
Caller.zig, ...)
This changes how non-async module loading works. In general, module loading
is triggered by a v8 callback. We ask it to process a module (a <script type=
module>) and then for every module that it depends on, we get a callback. This
callback expects the nested v8.Module instance, so we need to load it then and
there (as opposed to dynamic imports, where we only have to return a promise).
Previously, we solved this by issuing a blocking HTTP get in each callback. The
HTTP loop was able to continuing downloading already-queued resources, but if
a module depended on 20 nested modules, we'd issue 20 blocking gets one after
the other.
Once a module is compiled, we can ask v8 for a list of its dependent module. We
can them immediately start to download all of those modules. We then evaluate
the original module, which will trigger our callback. At this point, we still
need to block and wait for the response, but we've already started the download
and it's much faster. Sure, for the first module, we might need to wait the same
amount of time, but for the other 19, chances are by the time the callback
executes, we already have it downloaded and ready.
There is some risk to this change. The first is that I made a mistake. The
other is that one of the APIs that doesn't currently return an error changes
in the future.
Most CDP drivers have a mechanism to wait for idle network, or an almost idle
network (sometimes called networkIdle2). These are events the browser must emit.
The page will now emit `networkIdle` when we are reasonably sure there's no more
network activity (this requires some slight changes to request interception,
since, I believe, intercepted requests should be considered).
`networkAlmostIdle` is currently _always_ emitted prior to emitting
`networkIdle`. We should tweak this but I can't, at a glance, think of a great
heuristic for when this should be emitted.
Further reducing bouncing between page and server for loop polling. If there is
a page, the page polls. If there isn't a page, the server polls. Simpler.
Previously, the IO loop was doing three things:
1 - Managing timeouts (either from scripts or for our own needs)
2 - Handling browser IO events (page/script/xhr)
3 - Handling CDP events (accept, read, write, timeout)
With the libcurl merge, 1 was moved to an in-process scheduler and 2 was moved
to libcurl's own event loop. That means the entire loop code, including
the dependency on tigerbeetle-io existed for handling a single TCP client.
Not only is that a lot of code, there was also friction between the two loops
(the libcurl one and our IO loop), which would result in latency - while one
loop is waiting for the events, any events on the other loop go un-processed.
This PR removes our IO loop. To accomplish this:
1 - The main accept loop is blocking. This is simpler and works perfectly well,
given we only allow 1 active connection.
2 - The client socket is passed to libcurl - yes, libcurl's loop can take
arbitrary FDs and poll them along with its own.
In addition to having one less dependency, the CDP code is quite a bit simpler,
especially around shutdowns and writes. This also removes _some_ of the latency
caused by the friction between page process and CDP processing. Specifically,
when CDP now blocks for input, http page events (script loading, xhr, ...) will
still be processed.
There's still friction. For one, the reverse isn't true: when the page is
waiting for events, CDP events aren't going to be processed. But the page.wait
already have some sensitivity to this (e.g. the page.request_intercepted flag).
Also, when CDP waits, while we will process network events, page timeouts are
still not processed. Because of both these remaining issues, we still need to
jump between the two loops - but being able to block on CDP (even for a short
time) WITHOUT stopping the page's network I/O, should reduce some latency.
Add response_data event, CDP now captures the full body so that it can respond
to the Network.getResponseBody. This isn't memory efficient, but I don't see
another way to do it. At least this way, it's only capturing/storing every
response body when (a) CDP is used and (b) Network.enabled is called. That is,
as opposed to baking this into Http/Client.zig, which would force the memory
consumption for all use-cases.
There's arguably some optimizations we could make for XHR requests, which also
dupe/own the response. As of now, the response is dupe'd separately for CDP
and XHR.
With networking enabled, CDP listens to this event and emits a
`Network.loadingFinished` event. This is event is used by puppeteer to know that
details about the response (i.e. the body) can be queries.
Added dummy handling for the Network.getResponseBody message. Returns an
empty body. Needed because we emit the loadingFinished event which signals
to drivers that they can ask for the body.
Stream (to json) the Transfer as a request and response object in the various
network interception-related events (e.g. Network.responseReceived).
Add a page.request_intercepted boolean flag for CDP to signal the page that
requests have been intercepted, allowing Page.wait to prioritize intercept
handling (or, at least, not block it).
Test speed has been improved only slightly by tweaking a 2-second running tests.
Build has been improved by:
1 - moving logFunctionCallError out of js.Caller and to a standalone function
2 - removing some non-generic code from the generic portions of the logger
Caller.getter and Caller.setter have been removed in favor or calling
Caller.method. This wasn't previously possible - prior to our v8 upgrade, they
had different signatures.
Also removed a largely unused parser/str.zig file.
CDP translate this into a Network.loadingFailed. This is necessary to make sure
every Network.requestWillBeSent is paired with either a Network.loadingFailed
or a Network.responseReceived.
Outputs in logfmt in release and a "pretty" print in debug mode. The format
along with the log level will become arguments to the binary at some point in
the future.
- Add 2 internal notifications
1 - http_request_start
2 - http_request_complete
- When Network.enable CDP message is received, browser context registers for
these 2 events (when Network.disable is called, it unregisters)
- On http_request_start, CDP will emit a Network.requestWillBeSent message.
This _does not_ include all the fields, but what we have appears to be enough
for puppeteer.waitForNetworkIdle.
- On http_request_complete, CDP will emit a Network.responseReceived message.
This _does not_ include all the fields, bu what we have appears to be enough
for puppeteer.waitForNetworkIdle.
We currently don't emit any other new events, including any network-specific
lifecycleEvent (i.e. Chrome will emit an networkIdle and networkAlmostIdle).
To support this, the following other things were done:
- CDP now has a `notification_arena` which is re-used between browser contexts.
Normally, CDP code runs based on a "cmd" which has its own message_arena, but
these notifications happen out-of-band, so we needed a new arena which is
valid for handling 1 notification.
- HTTP Client is notification-aware. The SessionState no longer includes the
*http.Client directly. It instead includes an http.RequestFactory which is
the combination fo the client + a specific configuration (i.e. *Notification).
This ensures that all requests made from that factory have the same settings.
- However, despite the above, _some_ requests do not appear to emit CDP events,
such as loading a <script src="X">. So the page still deals directly with the
*http.Client.
- Playwright and Puppeteer (but Playwright in particular) are very sensitive to
event ordering. These new events have introduced additional sensitivity.
The result sent to Page.navigate had to be moved to inside the navigate event
handler, which meant passing some cdp-specific data (the input.id) into the
NavigateOpts. This is the only way I found to keep both happy - the sequence
of events is closer (but still pretty far) from what Chrome does.