Some data has to exist specifically for the navigation of one page to another.
For example, if a hyperlink is clicked, the URL begins its life with the
original page, but is transferred to the new page. The page_arena cannot be used
for such data.
It's possible to use the session_arena, but it's lifetime is much longer and,
given enough navigation, could accumulate a lot of memory.
The new transfer_arena exists within the session, but only exists until the
next navigation.
While currently only used for the navigation URL, the main goal here is to have
a place to put the request body on form submission, which has a lifetime similar
to a click url.
While I'm at it, I promoted the existing session arena and the new transfer
arena to the browser, allowing better memory re-use between sessions.
Replaces the existing, very specialized Notification with something more
general.
Currently, the existing page_navigate and page_navigated have been migrated.
Telemetry's page navigation event now also hooks into these events to generate
the telemetry record.
emit contextCreated when it's needed, not when it actually happens.
I thought we could make this sync-up, but we'd need to create 3 contexts to
satisfy both puppeteer and chromedp. So rather than having it partially
driven by notifications from Browser, I rather just fake it all for now.
- Pages within the same session have proper isolation
- they have their own window
- they have their own SessionState
- they have their own v8.Context
- Move inspector to CDP browser context
- Browser now knows nothing about the inspector
- Use notification to emit a context-created message
- This is still a bit hacky, but again, it decouples browser from CDP
It's still generic over the client - we need to assert messages written to and
be able to send specific commands, but it's no longer generic over Browser/
Session/Page/etc..
In order to support click handling on anchors from JavaScript, we need some hook
from the page/session to the CDP instance. This first phase adds notifications
in page.navigate, as well as a primitive notification hook to the session.
CDP's existing Page.navigate uses this new notifiation system.
Combine uri + rawuri into single struct.
Try to improve ownership around URIs and URI-like things.
- cookie & request can take *const std.Uri
(TODO: make them aware of the new URL struct?)
- Location (web api) should own its URL (web api URL)
- Window should own its Location
Most of these changes result in (a) a cleaner Page and (b) not having to carry
around 2 nullable objects (URI and rawuri).
FlatRenderer positions items on a single row, giving each a height and width of
1.
Added getBoundingClientRect to the DOMelement which, when requested for the
first time, will place the item in with the renderer.
The goal here is to give elements a fixed position and to make it easy to map
x,y coordinates onto an element. This should work, at least with puppeteer,
since it first requests the boundingClientRect before issuing a click.
Node registry now only tracks the node id (which we need to be consistent) and
the underlying parser.Node. All other data is loaded on-demand (i.e. when we
serialize the node). This allows us to serialize node values as they appear
when they are serialized, as opposed to when they are registered.
This expands on the existing CDP node work used in DOM.search. It introduces
a node registry to track all nodes returned to the client and give lookups to
get a node from a Id or a *parser.node.
Eventually, the goal is to have the Registry emit the DOM.setChildNodes event
whenever necessary, as well as support many of the missing DOM actions.
Added tests to existing search handlers. Reworked search a little bit to avoid
some unnecessary allocations and to hook it into the registry.
The generated Node is currently incomplete. The parentId is missing, the
children are missing. Also, we still need to associate the v8 ObjectId to the
node.
Finally, I moved all action handlers into a nested "domain" folder.
The two bigger changes here are:
1- The http_client has been moved from the Session to the Browser, allowing
its connection pool to be re-used across multiple sessions
2- The browser now has a page_arena which is used for all page-level allocation
and which can be re-used between pages (currently retains 1MB of memory).
Previously, pages uses an arena that was tied to the lifetime of the page,
thus it could not be re-used.
Using the Bench allocator for zig-js-runtime, allocated bytes went from
1347037879 to 834932438 (in a RUNS=1000 of puppeteer demo).
Various other changes to try to simplify the API and remove the possibility
of invalid states. For example, session.newPage() now includes the logic for
page.start() so that there should now never be a page that wasn't started.
I don't know if FrameId is related to an <iframe>, and whether each Page has
1 implicit "frame". But, playwright seems to treat frameId and targetId as
interchangeable, and chrome seems to agree (at leas to some degree); chrome will
return a targetId and reuse that value for the frameId.
So the simplest solution is just to remove our concept of a frameId and use
targetId exclusively. This doesn't seem to cause any issues with puppeteer.
The TL;DR is that this commit enforces the use of correct IDs, introduces a
BrowserContext, and adds some CDP tests.
These are the ids we need to be aware of when talking about CDP:
- id
- browserContextId
- targetId
- sessionId
- loaderId
- frameId
The `id` is the only one that _should_ originate from the driver. It's attached
to most messages and it's how we maintain a request -> response flow: when
the server responds to a specific message, it echo's back the id from the
requested message. (As opposed to out-of-band events sent from the server which
won't have an `id`). When I say "id" from this point forward, I mean every id
except for this req->res id.
Every other id is created by the browser.
Prior to this commit, we didn't really check incoming ids from the driver. If
the driver said "attachToTarget" and included a targetId, we just assumed that
this was the current targetId. This was aided by the fact that we only used
hard-coded IDS. If _we_ only "create" a frameId of "FRAME-1", then it's tempting
to think the driver will only ever send a frameId of "FRAME-1".
The issue with this approach is that _if_ the browser and driver fall out of sync
and there's only ever 1 browserContextId, 1 sessionId and 1 frameId, it's not
impossible to imagine cases where we behave on the thing.
Imagine this flow:
- Driver asks for a new BrowserContext
- Browser says OK, your browserContextId is 1
- Driver, for whatever reason, says close browserContextId 2
- Browser says, OK, but it doesn't check the id and just closes the only
BrowserContext it knows about (which is 1)
By both re-using the same hard-coded ids, and not verifying that the ids sent
from the client correspond to the correct ids, any issues are going to be hard
to debug.
Currently LOADER_ID and FRAEM_ID are still hard-coded. Baby steps.
ADD CDP testing helpers (mock Browser, Session, Page and Client). These are
placeholders until tests are added which use them.
Added a couple CDP tests.
CDP is now an struct which contains its own state a browser and a session.
When a client connection is made and successfully upgrades, the client creates
the CDP instance. There is now a cleaner separation betwen Server, Client and
CDP.
Removed a number of allocations, especially when writing results/events from
CDP to the client. Improved input message parsing. Tried to remove some usage
of undefined.