browser

mirror of https://github.com/lightpanda-io/browser.git synced 2026-03-22 20:54:43 +00:00

Author	SHA1	Message	Date
Adrià Arrufat	60699229ca	Merge branch 'main' into semantic-tree	2026-03-11 20:52:39 +09:00
Adrià Arrufat	af803da5c8	cdp.lp: use enum for getSemanticTree format param Leverages std.json.parse to automatically validate the format param into a type-safe enum.	2026-03-11 16:21:43 +09:00
Karl Seguin	2a103fc94a	Use Session as a container for cross-frame resources The introduction of frames means that data is no longer tied to a specific Page or Context. `255b9a91cc` introduced Origins for v8 values shared across frames of the same origin. The commit highlighted the lifetime mismatched that we now have with data that can outlive 1 frame. A specific issue with that commit was the finalizers were still Context-owned. But like any other piece of data, that isn't right; aside from modules, nothing should be context-owned. This commit continues where the last left off and moves finalizers from Context to Origin. This is done in a separate commit because it introduces significant changes. Currently, finalizers take a Page, but that's no longer correct. A value created in one Page, can outlive the Page. We need another container. I original thought to use Origin, but that isn't know to CDP/MCP. Instead, I decide to enhance the Session. Session is now the owner of the page.arena, the page.factory and the page.arena_pool. Finalizers are given a Session which they can use to release their arena.	2026-03-11 08:44:49 +08:00
Karl Seguin	94ce5edd20	Frames on the same origin share v8 data Depends on: https://github.com/lightpanda-io/zig-v8-fork/pull/153 In some ways this is an extension of https://github.com/lightpanda-io/browser/pull/1635 but it has more implications with respect to correctness. A js.Context wraps a v8::Context. One of the important thing it adds is the identity_map so that, given a Zig instance we always return the same v8::Object. But imagine code running in a frame. This frame has its own Context, and thus its own identity_map. What happens when that frame does: ```js window.top.frame_loaded = true; ``` From Zig's point of view, `Window.getTop` will return the correct Zig instance. It will return the Window references by the "root" page. When that instance is passed to the bridge, we'll look for the v8::Object in the Context's `identity_map` but wont' find it. The mapping exists in the root context `identity_map`, but not within this frame. So we create a new v8::Object and now our 1 zig instance has N v8::Objects for every page/frame that tries to access it. This breaks cross-frame scripting which should work, at least to some degree, even when frames are on the same origin. This commit adds a `js.Origin` which contains the `identity_map`, along with our other `v8::Global` storage. The `Env` now contains a `js.Origin` lookup, mapping an origin string (e.g. lightpanda.io:443) to an *Origin. When a Page's URL is changed, we call `self.js.setOrigin(new_url)` which will then either get or create an origin from the Env's origin lookup map. js.Origin is reference counted so that it remains valid so long as at least 1 frame references them. There's some special handling for null-origins (i.e. about:blank). At the root, null origins get a distinct/isolated Origin. For a frame, the parent's origin is used. Above, we talked about `identity_map`, but a `js.Context` has 8 other fields to track v8 values, e.g. `global_objects`, `global_functions`, `global_values_temp`, etc. These all must be shared by frames on the same origin. So all of these have also been moved to js.Origin. They've also been merged so that we now have 3 fields: `identity_map`, `globals` and `temps`. Finally, when the origin of a context is changed, we set the v8::Context's SecurityToken (to that origin). This is a key part of how v8 allows cross- context access.	2026-03-11 08:43:40 +08:00
Nikolay Govorov	3626f70d3e	Merge pull request #1759 from lightpanda-io/wp/mrdimidum/net-poll-runtime Network poll runtime	2026-03-10 23:38:07 +00:00
Pierre Tachoire	1ebf7460fe	Merge pull request #1768 from lightpanda-io/inspector_cleanup Call `resetContextGroup` on page removal	2026-03-10 15:32:47 +01:00
Karl Seguin	11fb5f990e	Call `resetContextGroup` on page removal Calling it here ensures that the inspector gets reset on internal page navigation. We were seeing intermittent segfaults on a problematic WPT tests (/encoding/legacy-mb-japanese/euc-jp/) which I believe this solves. (The tests are still broken. Because we don't support form targets, they cause the root page to reload in a tight cycle, causing a lot of context creation / destruction, which I thin was the issue. This commit doesn't fix the broken test but it hopefully fixes the crash). Also, clear out the Inspector's default_context when the default context is destroyed. (This was the first thing I did to try to fix the crash, it didn't work, but I believe it's correct).	2026-03-10 20:50:58 +08:00
Adrià Arrufat	d1ee0442ea	Merge branch 'main' into semantic-tree	2026-03-10 21:48:49 +09:00
Adrià Arrufat	62f31ea24a	Merge pull request #1765 from egrs/lp-get-structured-data add LP.getStructuredData CDP command	2026-03-10 21:48:18 +09:00
Pierre Tachoire	12c5bcd24f	cdp: reszie the screenshot to 1920x1080 To be consistent w/ layout size returned	2026-03-10 10:09:53 +01:00
Adrià Arrufat	56f47ee574	Merge branch 'main' into semantic-tree	2026-03-10 17:26:34 +09:00
egrs	74f0436ac7	merge main, resolve conflicts with getInteractiveElements	2026-03-10 09:25:12 +01:00
egrs	22d31b1527	add LP.getStructuredData CDP command	2026-03-10 09:19:51 +01:00
Karl Seguin	9f3bca771a	Merge pull request #1755 from lightpanda-io/cdp-page-layout-metrics cdp: add a dummy Page.getLayoutMetrics	2026-03-10 16:16:17 +08:00
Adrià Arrufat	4e16d90a81	Merge pull request #1757 from egrs/lp-get-interactive-elements add LP.getInteractiveElements CDP command	2026-03-10 17:15:18 +09:00
Pierre Tachoire	d669d5c153	cdp: add a dummy Page.getLayoutMetrics	2026-03-10 08:54:48 +01:00
egrs	dc3958356d	address review feedback - TreeWalker.Full instead of FullExcludeSelf so querying a specific nodeId evaluates the root element itself - resolve href to absolute URL via URL.resolve - isDisabled checks ancestor <fieldset disabled> with legend exemption - parameter order: allocator before *Page per convention	2026-03-10 08:13:01 +01:00
Nikolay Govorov	8e59ce9e9f	Prepare global NetworkRuntime module	2026-03-10 03:00:47 +00:00
egrs	a417c73bf7	add LP.getInteractiveElements CDP command Returns a structured list of all interactive elements on a page: buttons, links, inputs, ARIA widgets, contenteditable regions, and elements with event listeners. Includes accessible names, roles, listener types, and key attributes. Event listener introspection (both addEventListener and inline handlers) is unique to LP — no other browser exposes this to automation code.	2026-03-09 19:46:12 +01:00
Pierre Tachoire	8672232ee2	cdp: add dummy page.captureScreenshot	2026-03-09 17:38:57 +01:00
Adrià Arrufat	b8a3135835	SemanticTree: add pruning support and move logic to walk	2026-03-09 13:02:03 +09:00
Adrià Arrufat	b674c2e448	CDP/MCP: add highly compressed text format for semantic tree	2026-03-08 22:42:00 +09:00
Adrià Arrufat	248851701f	Refactor: move SemanticTree to core and expose via MCP tools	2026-03-06 15:44:03 +09:00
Adrià Arrufat	0f46277b1f	CDP: implement LP.getSemanticTree for native semantic DOM extraction	2026-03-06 15:29:32 +09:00
Pierre Tachoire	6a8174a15c	cdp: don't dispatch executionContextsCleared on frame navigation	2026-03-04 14:45:21 +01:00
Karl Seguin	01fab5c92a	Merge pull request #1706 from lightpanda-io/cdp-attach-to-browser cdp: fix send CDP raw command with Playwright	2026-03-04 07:40:05 +08:00
Karl Seguin	6f0cd87d1c	Merge pull request #1703 from lightpanda-io/client_and_script_manager Fix a few issues in Client	2026-03-04 07:32:14 +08:00
Pierre Tachoire	9ca5188e12	cdp: set consistent target's default with about:blank for url and empty title.	2026-03-03 17:24:08 +01:00
Pierre Tachoire	56cc881ac0	Fcdp: fix attachtToTarget and attachToBrowserTarget resp	2026-03-03 15:01:53 +01:00
Pierre Tachoire	06ef6d3e6a	cdp: attachToTarget must add the session id	2026-03-03 12:58:00 +01:00
Pierre Tachoire	14b58e8062	add target.attachToBrowserTarget	2026-03-03 12:58:00 +01:00
Pierre Tachoire	eee232c12c	cdp: allow multiple calls to attachToTarget Playwright, when creating a new CDPSession, sends an attachToBrowserTarget followed by another attachToTarget to re-attach itself to the existing target. see playwright/axtree.js from demo/ repository.	2026-03-03 12:58:00 +01:00
Karl Seguin	523efbd85a	Fix a few issues in Client Most significantly, if removing from the multi fails, the connection is added to a "dirty" list for the removal to be retried later. Looking at the curl source code, remove fails on a recursive call, and we've struggled with recursive calls before, so I _think_ this might be happening (it fails in other cases, but I suspect if it _is_ happening, it's for this reason). The retry happens _after_ `perform`, so it cannot fail for due to recursiveness. If it fails at this point, we @panic. This is harsh, but it isn't easily recoverable and before putting effort into it, I'd like to know that it's actually happening. Fix potential use of undefined when a 401-407 request is received, but no 'WWW-Authenticate' or 'Proxy-Authenticate' header is received. Don't call `curl_multi_remove_handle` on an easy that hasn't been added yet do to error. Specifically, if `makeRequest` fails during setup, transfer_conn is nulled so that `transfer.deinit()` doesn't try to remove the connection. And the conn is removed from the `in_use` queue and made `available` again. On Abort, if getting the private fails (extremely unlikely), we now still try to remove the connection from the multi. Added a few more fields to the famous "ScriptManager.Header recall" assertion.	2026-03-03 18:02:06 +08:00
Adrià Arrufat	b2e301418f	cdp.lp: use page.document instead of window._document	2026-03-03 17:11:16 +09:00
Adrià Arrufat	334a2e44a1	lp: simplify dom_node resolution in getMarkdown	2026-03-03 17:08:43 +09:00
Adrià Arrufat	c9121a03d2	cdp: move LP.getMarkdown test to lp domain	2026-03-03 16:39:31 +09:00
Adrià Arrufat	cc93180d57	cdp: add LP domain and getMarkdown method This PR introduces a custom CDP domain 'LP' (Lightpanda) to expose browser-specific tools. The first method, 'LP.getMarkdown', allows retrieving a Markdown representation of the DOM or a specific node by its 'nodeId'. This is optimized for AI agents and LLM-based scraping tasks.	2026-03-03 16:35:48 +09:00
Karl Seguin	10ad5d763e	Rename page.id to page._frame_id This field was recently added and is used to generate correct frameIds in CDP messages. They remain the same during a navigation event, so calling them page.id might cause surprises since navigation events create new pages, but retain the original id. Hence, frame_id is more accurate and hopefully less surprising. (This is a small cleanup prior to doing some iframe navigation work).	2026-03-02 16:21:29 +08:00
Karl Seguin	e65667963f	Correctly JSON encode URL I think this code comes from some serialization tweak from when everything was an std.Uri and by switch to [:0]const u8 everywhere not only was the tweak unecessary, it was also wrong - possibly resulting in the generation of invalid JSON.	2026-02-28 12:48:45 +08:00
Karl Seguin	315c9a2d92	Add RC support to NodeList Most importantly, this allows the Selector.List to be self-contained with an arena from the ArenaPool. Selector.List can be both relatively large and relatively common, so moving it off the page.arena is a nice win. Also applied this to ChildNodes, which is much smaller but could also be called often. I was initially going to hook into the v8::Object's internal fields to store the referencing v8::Object. So the v8::Object representing the Iterator would store the v8::Object representing the NodeList inside of its internal field - which the GC would trace/detect/respect. And that is probably the fastest and most v8-ish solution, but I couldn't come up with an elegant solution. The best I had was having a "afterCreate" callback which passed the v8 object (this is similar to the old postAttach callback we had, but used for a different purpose). However, since "acquireRef" was recently added to events, re-using that was much simpler and worked well.	2026-02-27 10:29:46 +08:00
Karl Seguin	21be3db51f	Callers to page.navigate ensure URL is properly encoded. Follow up to https://github.com/lightpanda-io/browser/pull/1646 The encodeURL (renamed to ensureEncoded and exposed in this commit) already handled already-encoded URLs, so this was largely a matter of exposing the functionality. The reason this isn't baked directly into Page.navigate is that, in some places e.g. internal navigation, the URL is already know to be encoded. So it's up to every caller to make sure they are passing a valid URL to navigate.	2026-02-26 12:22:06 +08:00
Karl Seguin	71d34592d9	add frame created cdp messages	2026-02-19 23:47:33 +08:00
Karl Seguin	db2927eea7	cleanup a not-so-great rebase	2026-02-19 23:47:33 +08:00
Karl Seguin	bb01a5cb31	Make CDP frame-aware	2026-02-19 23:47:33 +08:00
Karl Seguin	938cd5e136	Merge pull request #1582 from lightpanda-io/cdp_per_page_frame_id Rework CDP frameIds (and loaderIds and requestIds and interceptorIds)	2026-02-19 22:16:52 +08:00
Karl Seguin	e2a1ce623c	Rework CDP frameIds (and loaderIds and requestIds and interceptorIds) Our BrowsingContext currently supports 1 target. So we have a per-BC target_id. Previously, our target had 1 "frame" - our page. So we often treated the targetId as the frameId. But to work with frames, we need page-specific frameIds and loaderIds. This tries to clean up our ids (a little). frameIds are now ids derived from a new incrementing page.id. This page.id has to be passed around (via http Requests and through notifications) in order to properly generate messages with a frameId.	2026-02-19 13:01:41 +08:00
Karl Seguin	645da2e307	Reduce cost of various Element render-related properties. Added a get-only `getStyle` which doesn't lazily create a new style if none exists. This can be used in the (frequently used) `checkVisibility` to avoid an allocation. Added a specialized getBoundingClientRectForVisible which skips the checkVisibility check, since a few callers have already done their own visibility check. DOMRect is now off the heap. This avoids _a lot_ of allocation when a DOMRect is only needed for internal calculation, e.g. in Document.elementFromPoint.	2026-02-19 09:45:56 +08:00
egrs	628049cfd7	add cookie size to CDP response Compute and include the cookie size field (name.len + value.len) in Storage.getCookies and Network.getCookies CDP responses, matching Chrome's behavior.	2026-02-17 13:08:02 +01:00
Nikolay Govorov	6553bb8147	Remove los interceptors feature	2026-02-16 15:48:18 +00:00
Karl Seguin	14112ed294	Remove Page.reset Page.reset exists for 1 use case: multiple calls to the Page.navigate CDP method. At an extreme, something like this in puppeteer: ``` await page.goto(baseURL + '/campfire-commerce/'); await page.goto(baseURL + '/campfire-commerce/'); ``` Rather than handling this generically in Page, we now handle this case specifically at the CDP layer. If the page isn't in its initial load state, i.e. page._load_state != .waiting, then we reload the page from the session. For reloading, my initial inclination was to do session.removePage then session.createPage(). This behavior still seems potentially correct to me, but compared to our `reset`, this would trigger extra notifications, namely: self.notification.dispatch(.page_remove, .{}); and self.notification.dispatch(.page_created, page); Bacause of https://github.com/lightpanda-io/browser/pull/1265/ I guess that could have side effects. So, to keep the behavior as close to the current "reset", a new `session.replacePage()` has been added which behaves a lot like removePage + createPage, but without the notifications being sent. While I generally think this is just cleaner, this was largely driven by some planning for frame support. The entity for a Frame will share a lot with the Page (we'll extract that logic), so simplifying the Page, especially around initialization, helps simplify frame support.	2026-02-11 13:53:49 +08:00

1 2 3 4 5 ...

256 Commits