browser

mirror of https://github.com/lightpanda-io/browser.git synced 2026-03-29 16:10:04 +00:00

Author	SHA1	Message	Date
Karl Seguin	753391b7e2	Add origins safety cleanup when destroying the context for the root page	2026-03-11 08:43:41 +08:00
Karl Seguin	94ce5edd20	Frames on the same origin share v8 data Depends on: https://github.com/lightpanda-io/zig-v8-fork/pull/153 In some ways this is an extension of https://github.com/lightpanda-io/browser/pull/1635 but it has more implications with respect to correctness. A js.Context wraps a v8::Context. One of the important thing it adds is the identity_map so that, given a Zig instance we always return the same v8::Object. But imagine code running in a frame. This frame has its own Context, and thus its own identity_map. What happens when that frame does: ```js window.top.frame_loaded = true; ``` From Zig's point of view, `Window.getTop` will return the correct Zig instance. It will return the Window references by the "root" page. When that instance is passed to the bridge, we'll look for the v8::Object in the Context's `identity_map` but wont' find it. The mapping exists in the root context `identity_map`, but not within this frame. So we create a new v8::Object and now our 1 zig instance has N v8::Objects for every page/frame that tries to access it. This breaks cross-frame scripting which should work, at least to some degree, even when frames are on the same origin. This commit adds a `js.Origin` which contains the `identity_map`, along with our other `v8::Global` storage. The `Env` now contains a `js.Origin` lookup, mapping an origin string (e.g. lightpanda.io:443) to an *Origin. When a Page's URL is changed, we call `self.js.setOrigin(new_url)` which will then either get or create an origin from the Env's origin lookup map. js.Origin is reference counted so that it remains valid so long as at least 1 frame references them. There's some special handling for null-origins (i.e. about:blank). At the root, null origins get a distinct/isolated Origin. For a frame, the parent's origin is used. Above, we talked about `identity_map`, but a `js.Context` has 8 other fields to track v8 values, e.g. `global_objects`, `global_functions`, `global_values_temp`, etc. These all must be shared by frames on the same origin. So all of these have also been moved to js.Origin. They've also been merged so that we now have 3 fields: `identity_map`, `globals` and `temps`. Finally, when the origin of a context is changed, we set the v8::Context's SecurityToken (to that origin). This is a key part of how v8 allows cross- context access.	2026-03-11 08:43:40 +08:00
Nikolay Govorov	3626f70d3e	Merge pull request #1759 from lightpanda-io/wp/mrdimidum/net-poll-runtime Network poll runtime	2026-03-10 23:38:07 +00:00
Pierre Tachoire	1ebf7460fe	Merge pull request #1768 from lightpanda-io/inspector_cleanup Call `resetContextGroup` on page removal	2026-03-10 15:32:47 +01:00
Karl Seguin	11fb5f990e	Call `resetContextGroup` on page removal Calling it here ensures that the inspector gets reset on internal page navigation. We were seeing intermittent segfaults on a problematic WPT tests (/encoding/legacy-mb-japanese/euc-jp/) which I believe this solves. (The tests are still broken. Because we don't support form targets, they cause the root page to reload in a tight cycle, causing a lot of context creation / destruction, which I thin was the issue. This commit doesn't fix the broken test but it hopefully fixes the crash). Also, clear out the Inspector's default_context when the default context is destroyed. (This was the first thing I did to try to fix the crash, it didn't work, but I believe it's correct).	2026-03-10 20:50:58 +08:00
Adrià Arrufat	d1ee0442ea	Merge branch 'main' into semantic-tree	2026-03-10 21:48:49 +09:00
Adrià Arrufat	62f31ea24a	Merge pull request #1765 from egrs/lp-get-structured-data add LP.getStructuredData CDP command	2026-03-10 21:48:18 +09:00
Adrià Arrufat	064e7b404b	SemanticTree: unify interactivity detection logic	2026-03-10 19:02:55 +09:00
Pierre Tachoire	12c5bcd24f	cdp: reszie the screenshot to 1920x1080 To be consistent w/ layout size returned	2026-03-10 10:09:53 +01:00
Adrià Arrufat	56f47ee574	Merge branch 'main' into semantic-tree	2026-03-10 17:26:34 +09:00
egrs	74f0436ac7	merge main, resolve conflicts with getInteractiveElements	2026-03-10 09:25:12 +01:00
egrs	22d31b1527	add LP.getStructuredData CDP command	2026-03-10 09:19:51 +01:00
Karl Seguin	9f3bca771a	Merge pull request #1755 from lightpanda-io/cdp-page-layout-metrics cdp: add a dummy Page.getLayoutMetrics	2026-03-10 16:16:17 +08:00
Adrià Arrufat	4e16d90a81	Merge pull request #1757 from egrs/lp-get-interactive-elements add LP.getInteractiveElements CDP command	2026-03-10 17:15:18 +09:00
Pierre Tachoire	d669d5c153	cdp: add a dummy Page.getLayoutMetrics	2026-03-10 08:54:48 +01:00
egrs	dc3958356d	address review feedback - TreeWalker.Full instead of FullExcludeSelf so querying a specific nodeId evaluates the root element itself - resolve href to absolute URL via URL.resolve - isDisabled checks ancestor <fieldset disabled> with legend exemption - parameter order: allocator before *Page per convention	2026-03-10 08:13:01 +01:00
Nikolay Govorov	8e59ce9e9f	Prepare global NetworkRuntime module	2026-03-10 03:00:47 +00:00
Adrià Arrufat	a318c6263d	SemanticTree: improve visibility, AX roles and xpath generation - Use `checkVisibility` for more accurate element visibility detection. - Add support for color, date, file, and month AX roles. - Optimize XPath generation by tracking sibling indices during the walk. - Refine interactivity detection for form elements.	2026-03-10 09:23:06 +09:00
egrs	a417c73bf7	add LP.getInteractiveElements CDP command Returns a structured list of all interactive elements on a page: buttons, links, inputs, ARIA widgets, contenteditable regions, and elements with event listeners. Includes accessible names, roles, listener types, and key attributes. Event listener introspection (both addEventListener and inline handlers) is unique to LP — no other browser exposes this to automation code.	2026-03-09 19:46:12 +01:00
Pierre Tachoire	8672232ee2	cdp: add dummy page.captureScreenshot	2026-03-09 17:38:57 +01:00
Adrià Arrufat	85ebbe8759	SemanticTree: improve accessibility tree and name calculation - Add more structural roles (banner, navigation, main, list, etc.). - Implement fallback for accessible names (SVG titles, image alt text). - Skip children for leaf-like semantic nodes to reduce redundancy. - Disable pruning in the default semantic tree view.	2026-03-09 21:04:47 +09:00
Adrià Arrufat	c3a53752e7	CDP: simplify AXNode name extraction logic	2026-03-09 15:34:59 +09:00
Adrià Arrufat	b8a3135835	SemanticTree: add pruning support and move logic to walk	2026-03-09 13:02:03 +09:00
Adrià Arrufat	b674c2e448	CDP/MCP: add highly compressed text format for semantic tree	2026-03-08 22:42:00 +09:00
Adrià Arrufat	b8139a6e83	CDP/MCP: improve Stagehand compatibility for semantic tree	2026-03-08 15:48:44 +09:00
Adrià Arrufat	e0f0b9f210	SemanticTree: use AXRole enum for interactive role check	2026-03-06 16:26:08 +09:00
Adrià Arrufat	248851701f	Refactor: move SemanticTree to core and expose via MCP tools	2026-03-06 15:44:03 +09:00
Adrià Arrufat	0f46277b1f	CDP: implement LP.getSemanticTree for native semantic DOM extraction	2026-03-06 15:29:32 +09:00
Karl Seguin	6c5efe6ce0	Merge pull request #1715 from lightpanda-io/cdp-frame-navigate cdp: don't dispatch executionContextsCleared on frame navigation	2026-03-04 22:02:30 +08:00
Pierre Tachoire	6a8174a15c	cdp: don't dispatch executionContextsCleared on frame navigation	2026-03-04 14:45:21 +01:00
Pierre Tachoire	40c3f1b618	cdp: fix req id resolver, they are REQ- not RID-	2026-03-04 13:00:16 +01:00
Karl Seguin	01fab5c92a	Merge pull request #1706 from lightpanda-io/cdp-attach-to-browser cdp: fix send CDP raw command with Playwright	2026-03-04 07:40:05 +08:00
Karl Seguin	6f0cd87d1c	Merge pull request #1703 from lightpanda-io/client_and_script_manager Fix a few issues in Client	2026-03-04 07:32:14 +08:00
Pierre Tachoire	9ca5188e12	cdp: set consistent target's default with about:blank for url and empty title.	2026-03-03 17:24:08 +01:00
Pierre Tachoire	56cc881ac0	Fcdp: fix attachtToTarget and attachToBrowserTarget resp	2026-03-03 15:01:53 +01:00
Pierre Tachoire	06ef6d3e6a	cdp: attachToTarget must add the session id	2026-03-03 12:58:00 +01:00
Pierre Tachoire	14b58e8062	add target.attachToBrowserTarget	2026-03-03 12:58:00 +01:00
Pierre Tachoire	eee232c12c	cdp: allow multiple calls to attachToTarget Playwright, when creating a new CDPSession, sends an attachToBrowserTarget followed by another attachToTarget to re-attach itself to the existing target. see playwright/axtree.js from demo/ repository.	2026-03-03 12:58:00 +01:00
Karl Seguin	523efbd85a	Fix a few issues in Client Most significantly, if removing from the multi fails, the connection is added to a "dirty" list for the removal to be retried later. Looking at the curl source code, remove fails on a recursive call, and we've struggled with recursive calls before, so I _think_ this might be happening (it fails in other cases, but I suspect if it _is_ happening, it's for this reason). The retry happens _after_ `perform`, so it cannot fail for due to recursiveness. If it fails at this point, we @panic. This is harsh, but it isn't easily recoverable and before putting effort into it, I'd like to know that it's actually happening. Fix potential use of undefined when a 401-407 request is received, but no 'WWW-Authenticate' or 'Proxy-Authenticate' header is received. Don't call `curl_multi_remove_handle` on an easy that hasn't been added yet do to error. Specifically, if `makeRequest` fails during setup, transfer_conn is nulled so that `transfer.deinit()` doesn't try to remove the connection. And the conn is removed from the `in_use` queue and made `available` again. On Abort, if getting the private fails (extremely unlikely), we now still try to remove the connection from the multi. Added a few more fields to the famous "ScriptManager.Header recall" assertion.	2026-03-03 18:02:06 +08:00
Adrià Arrufat	b2e301418f	cdp.lp: use page.document instead of window._document	2026-03-03 17:11:16 +09:00
Adrià Arrufat	334a2e44a1	lp: simplify dom_node resolution in getMarkdown	2026-03-03 17:08:43 +09:00
Adrià Arrufat	c9121a03d2	cdp: move LP.getMarkdown test to lp domain	2026-03-03 16:39:31 +09:00
Adrià Arrufat	cc93180d57	cdp: add LP domain and getMarkdown method This PR introduces a custom CDP domain 'LP' (Lightpanda) to expose browser-specific tools. The first method, 'LP.getMarkdown', allows retrieving a Markdown representation of the DOM or a specific node by its 'nodeId'. This is optimized for AI agents and LLM-based scraping tasks.	2026-03-03 16:35:48 +09:00
Karl Seguin	7695c8403f	Merge pull request #1692 from lightpanda-io/rename_page_id_to_frame_id Rename page.id to page._frame_id	2026-03-02 17:40:43 +08:00
Karl Seguin	10ad5d763e	Rename page.id to page._frame_id This field was recently added and is used to generate correct frameIds in CDP messages. They remain the same during a navigation event, so calling them page.id might cause surprises since navigation events create new pages, but retain the original id. Hence, frame_id is more accurate and hopefully less surprising. (This is a small cleanup prior to doing some iframe navigation work).	2026-03-02 16:21:29 +08:00
Karl Seguin	03b999c592	Remove redundant CDP v8 shutdown https://github.com/lightpanda-io/browser/pull/1614 improved our shutdown behavior so that microtasks associated with a context wouldn't fire after the context was disposed of. This involved having context-specific microtasks, pumping the message loop, and prevent re-entry. The shutdown code in CDP already had much of this behavior built-in, but it has now become redundant. Most importantly the CDP shutdown logic did not prevent re-entry. Removing this code fixes a flaky WPT crash. I didn't seem to be tied to a specific test, but rather a cross-context/page use-after-free that was saw prior to 1614. I could reproduce it reliably by running `/wasm/core/`. I'll be honest, it isn't clear to me why _removing_ the CDP cleanup helps. Running the message loop and microtask _before_ our normal shutdown might be unnecessary, but why would it crash? I don't know, but the CDP path is slightly different in that it also involves Inspector shutdown. So there's still something about this flow I don't quite understand. And, at least for this case the current flow seems "correct".	2026-03-02 10:24:07 +08:00
Karl Seguin	e65667963f	Correctly JSON encode URL I think this code comes from some serialization tweak from when everything was an std.Uri and by switch to [:0]const u8 everywhere not only was the tweak unecessary, it was also wrong - possibly resulting in the generation of invalid JSON.	2026-02-28 12:48:45 +08:00
Karl Seguin	e4cb78abee	Merge pull request #1670 from lightpanda-io/cdata_sso Change CData._data from []const to String (SSO)	2026-02-27 17:30:03 +08:00
Karl Seguin	870fd1654d	Change CData._data from []const to String (SSO) After looking at a handful of websites, the # of Text and Commend nodes that are small (<= 12 bytes) is _really_ high. Ranging from 85% to 98%. I thought that was high, but a lot of it is indentation or a sentence that's broken down into multiple nodes, eg: <div><b>sale!</b> <span class=price>$1.99</span> buy now<div> So what looks like 1 sentence to us, is actually 3 text nodes. On a typical website, we should see thousands of fewer allocations in the page arena for the text in text nodes.	2026-02-27 12:53:54 +08:00
Karl Seguin	315c9a2d92	Add RC support to NodeList Most importantly, this allows the Selector.List to be self-contained with an arena from the ArenaPool. Selector.List can be both relatively large and relatively common, so moving it off the page.arena is a nice win. Also applied this to ChildNodes, which is much smaller but could also be called often. I was initially going to hook into the v8::Object's internal fields to store the referencing v8::Object. So the v8::Object representing the Iterator would store the v8::Object representing the NodeList inside of its internal field - which the GC would trace/detect/respect. And that is probably the fastest and most v8-ish solution, but I couldn't come up with an elegant solution. The best I had was having a "afterCreate" callback which passed the v8 object (this is similar to the old postAttach callback we had, but used for a different purpose). However, since "acquireRef" was recently added to events, re-using that was much simpler and worked well.	2026-02-27 10:29:46 +08:00

1 2 3 4 5 ...

558 Commits