When set, this disables the host verification of all HTTP requests. Available
for both the fetch and serve mode.
Also introduced an App.Config, for future command line options which need to
be passed more deeply into the code.
The two bigger changes here are:
1- The http_client has been moved from the Session to the Browser, allowing
its connection pool to be re-used across multiple sessions
2- The browser now has a page_arena which is used for all page-level allocation
and which can be re-used between pages (currently retains 1MB of memory).
Previously, pages uses an arena that was tied to the lifetime of the page,
thus it could not be re-used.
Using the Bench allocator for zig-js-runtime, allocated bytes went from
1347037879 to 834932438 (in a RUNS=1000 of puppeteer demo).
Various other changes to try to simplify the API and remove the possibility
of invalid states. For example, session.newPage() now includes the logic for
page.start() so that there should now never be a page that wasn't started.
I don't know if FrameId is related to an <iframe>, and whether each Page has
1 implicit "frame". But, playwright seems to treat frameId and targetId as
interchangeable, and chrome seems to agree (at leas to some degree); chrome will
return a targetId and reuse that value for the frameId.
So the simplest solution is just to remove our concept of a frameId and use
targetId exclusively. This doesn't seem to cause any issues with puppeteer.
The getTargetInfo result must return a `targetInfo` key.
Here is an example returned by Chrome:
```json
{
"id": 16,
"result": {
"targetInfo": {
"targetId": "d93a1bbc-f906-4bbb-bb4d-a2285234b091",
"type": "browser",
"title": "",
"url": "",
"attached": true,
"canAccessOpener": false
}
}
}
```
Rely on inspector to send the result, otherwise we'll send 2 responses to the
same message (one ourselves and one from the inspector), which Playwright does
not like.
The TL;DR is that this commit enforces the use of correct IDs, introduces a
BrowserContext, and adds some CDP tests.
These are the ids we need to be aware of when talking about CDP:
- id
- browserContextId
- targetId
- sessionId
- loaderId
- frameId
The `id` is the only one that _should_ originate from the driver. It's attached
to most messages and it's how we maintain a request -> response flow: when
the server responds to a specific message, it echo's back the id from the
requested message. (As opposed to out-of-band events sent from the server which
won't have an `id`). When I say "id" from this point forward, I mean every id
except for this req->res id.
Every other id is created by the browser.
Prior to this commit, we didn't really check incoming ids from the driver. If
the driver said "attachToTarget" and included a targetId, we just assumed that
this was the current targetId. This was aided by the fact that we only used
hard-coded IDS. If _we_ only "create" a frameId of "FRAME-1", then it's tempting
to think the driver will only ever send a frameId of "FRAME-1".
The issue with this approach is that _if_ the browser and driver fall out of sync
and there's only ever 1 browserContextId, 1 sessionId and 1 frameId, it's not
impossible to imagine cases where we behave on the thing.
Imagine this flow:
- Driver asks for a new BrowserContext
- Browser says OK, your browserContextId is 1
- Driver, for whatever reason, says close browserContextId 2
- Browser says, OK, but it doesn't check the id and just closes the only
BrowserContext it knows about (which is 1)
By both re-using the same hard-coded ids, and not verifying that the ids sent
from the client correspond to the correct ids, any issues are going to be hard
to debug.
Currently LOADER_ID and FRAEM_ID are still hard-coded. Baby steps.
ADD CDP testing helpers (mock Browser, Session, Page and Client). These are
placeholders until tests are added which use them.
Added a couple CDP tests.
CDP is now an struct which contains its own state a browser and a session.
When a client connection is made and successfully upgrades, the client creates
the CDP instance. There is now a cleaner separation betwen Server, Client and
CDP.
Removed a number of allocations, especially when writing results/events from
CDP to the client. Improved input message parsing. Tried to remove some usage
of undefined.
Adding HTTP & websocket awareness to the TCP server.
HTTP server handles `GET /json/version` and websocket upgrade requests.
Conceptually, websocket handling is the same code as before, but receiving
data will parse the websocket frames and writing data will wrap it in
a websocket frame.
The previous `Ctx` was split into a `Server` and a `Client`. This was
largely done to make it easy to write unit tests, since the `Client` is
a generic, all its dependencies (i.e. the server) can be mocked out. This
also makes it a bit nicer to know if there is or isn't a client (via the
server's client optional).
Added a MemoryPool for the Send object (I thought that was a nice touch!)
Removed MacOS hack on accept/conn completion usage.
Known issues:
- When framing an outgoing message, the entire message has to be duped. This
is no worse than how it was before, but it should be possible to eliminate
this in the future. Probably not part of this PR.
- Websocket parsing will reject continuation frames. I don't know of a single
client that will send a fragmented message (websocket has its own
message fragmentation), but we should probably still support this just in
case.
- I don't think the receive, timeout and close completions can safely be
re-used like we're doing. I believe they need to be associated with a specific
client socket.
- A new connection creates a new browser session. I think this is right (??),
but for the very first, we're throwing out a perfectly usable session. I'm
thinking this might be a change to how Browser/Sessions work.
- zig build test won't compile. This branch reproduces the issue with none
of these changes:
https://github.com/karlseguin/browser/tree/broken_test_build
(or, as a diff to main):
https://github.com/lightpanda-io/browser/compare/main...karlseguin:broken_test_build