CDP currently assumes that if we get a page-related notification (like a
request interception, or page lifecycle event), then we must have a session
and page.
But, Target.detachFromTarget can remove the session from the BrowserContext
while still having the page run (I wonder if we should stop the page at this
point??). So, remove these assumptions and make sure we have a page/session
in the handling of page events.
Most CDP drivers have a mechanism to wait for idle network, or an almost idle
network (sometimes called networkIdle2). These are events the browser must emit.
The page will now emit `networkIdle` when we are reasonably sure there's no more
network activity (this requires some slight changes to request interception,
since, I believe, intercepted requests should be considered).
`networkAlmostIdle` is currently _always_ emitted prior to emitting
`networkIdle`. We should tweak this but I can't, at a glance, think of a great
heuristic for when this should be emitted.
Further reducing bouncing between page and server for loop polling. If there is
a page, the page polls. If there isn't a page, the server polls. Simpler.
The thin mimalloc API is currently defensive around incorrect setup/teardown by
guarding against using/destroying the arena when the heap is null, or creating
an arena when it already exists.
The only time these checks will fail is when the code is wrong, e.g. trying
to use libdom before or after freeing the arena. The current behavior can mask
these errors, plus add runtime overhead.
Removes optional platform, which only existed for tests.
There is now a global `@import("testing.zig").test_app` available. This is setup
when the test runner starts, and cleaned up at the end of tests. Individual
tests don't have to worry about creating app, which I assume was the reason I
Platform optional, since that woul dhave been something else that needed to be
setup.
Previously, the IO loop was doing three things:
1 - Managing timeouts (either from scripts or for our own needs)
2 - Handling browser IO events (page/script/xhr)
3 - Handling CDP events (accept, read, write, timeout)
With the libcurl merge, 1 was moved to an in-process scheduler and 2 was moved
to libcurl's own event loop. That means the entire loop code, including
the dependency on tigerbeetle-io existed for handling a single TCP client.
Not only is that a lot of code, there was also friction between the two loops
(the libcurl one and our IO loop), which would result in latency - while one
loop is waiting for the events, any events on the other loop go un-processed.
This PR removes our IO loop. To accomplish this:
1 - The main accept loop is blocking. This is simpler and works perfectly well,
given we only allow 1 active connection.
2 - The client socket is passed to libcurl - yes, libcurl's loop can take
arbitrary FDs and poll them along with its own.
In addition to having one less dependency, the CDP code is quite a bit simpler,
especially around shutdowns and writes. This also removes _some_ of the latency
caused by the friction between page process and CDP processing. Specifically,
when CDP now blocks for input, http page events (script loading, xhr, ...) will
still be processed.
There's still friction. For one, the reverse isn't true: when the page is
waiting for events, CDP events aren't going to be processed. But the page.wait
already have some sensitivity to this (e.g. the page.request_intercepted flag).
Also, when CDP waits, while we will process network events, page timeouts are
still not processed. Because of both these remaining issues, we still need to
jump between the two loops - but being able to block on CDP (even for a short
time) WITHOUT stopping the page's network I/O, should reduce some latency.
chromedb doesn't support duplicate header names. Although servers _will_ send
this (e.g. Cache-Control: public\r\nCache-Control: max-age=60\r\n), Chrome
seems to join them with a "\n". So we do the same.
A note on curl_easy_nextheader, which this code ultimately uses to iterate
and collect the headers. The documentation says:
Applications must copy the data if they want it to survive subsequent API
calls or the life-time of the easy handle.
As-is, I'd understand this to mean that a given header name/value is only
valid until any API call, including another call to curl_easy_nextheader. So,
from this comment, we _should_ be duping the name/value. But we don't. Why?
Because, despite the note in the documentation, this doesn't appear to be how
it actually works, nor does it really make sense. If it's just a linked list,
there's no reason curl_easy_nextheader should invalidate previous results. I'm
guessing this is just a general lack of guarantee libcurl is willing to make re
lifetimes.
https://github.com/lightpanda-io/browser/issues/966
Add response_data event, CDP now captures the full body so that it can respond
to the Network.getResponseBody. This isn't memory efficient, but I don't see
another way to do it. At least this way, it's only capturing/storing every
response body when (a) CDP is used and (b) Network.enabled is called. That is,
as opposed to baking this into Http/Client.zig, which would force the memory
consumption for all use-cases.
There's arguably some optimizations we could make for XHR requests, which also
dupe/own the response. As of now, the response is dupe'd separately for CDP
and XHR.
With networking enabled, CDP listens to this event and emits a
`Network.loadingFinished` event. This is event is used by puppeteer to know that
details about the response (i.e. the body) can be queries.
Added dummy handling for the Network.getResponseBody message. Returns an
empty body. Needed because we emit the loadingFinished event which signals
to drivers that they can ask for the body.
Stream (to json) the Transfer as a request and response object in the various
network interception-related events (e.g. Network.responseReceived).
Add a page.request_intercepted boolean flag for CDP to signal the page that
requests have been intercepted, allowing Page.wait to prioritize intercept
handling (or, at least, not block it).
On client.request(req) we now immediately wrap the request into a Transfer. This
results in less copying of the Request object. It also makes the transfer.uri
available, so CDP no longer needs to std.Uri(request.url) anymore.
The main advantage is that it's easier to manage resources. There was a use-
after free before due to the sensitive nature of the tranfer's lifetime. There
were also corner cases where some resources might not be freed. This is
hopefully fixed with the lifetime of Transfer being extended.