Commit Graph

87 Commits

Author SHA1 Message Date
Karl Seguin
ddb549cb45 cookie support 2025-08-11 21:37:02 +08:00
Karl Seguin
c7484c69c0 Increase max concurrent request to 10
Improve wait analysis dump.

De-prioritize secondary schedules.

Don't log warning for application/json scripts

Change pretty log timer to display time from start.
2025-08-11 21:37:02 +08:00
Karl Seguin
9876d79680 Add Accept-Encoding
This is necessary because of CloudFront which will send gzip content even if
we don't ask for it.

Properly handle scripts that are both async and defer.

Add a helper to print state of page wait. This can be helpful in identifying
what's causing the page to hang on page.wait.
2025-08-11 21:37:02 +08:00
Karl Seguin
32566ccc80 Set window location on load
Set SUPPRESS_CONNECT_HEADERS option.
2025-08-11 21:37:02 +08:00
Karl Seguin
7f9e309ae8 Shutdown clean async scripts
Set parent current script
2025-08-11 21:37:02 +08:00
Karl Seguin
7831aabe5a connect proxy 2025-08-11 21:37:02 +08:00
Karl Seguin
74b40b97ec fix ScriptManager wrong order execution 2025-08-11 21:37:02 +08:00
Karl Seguin
f45726d61f ScriptManager & HttpClient support for JS modules
Improve cleanup/shutdown (*cough* memory leaks *cough*)
2025-08-11 21:37:01 +08:00
Karl Seguin
3c0d027306 dynamic script support 2025-08-11 21:37:01 +08:00
Karl Seguin
4244b572d1 Improve page.wait
Allow page.wait to transition page mode.

Optimize initial page load. No point running scheduler until the initial
page is loaded.

Support ISO-8859-1 charset
2025-08-11 21:37:01 +08:00
Karl Seguin
77475ca5e4 Re-enable --insecure_disable_tls_host_verification
Better error logs on http callback error

Fix wait timing
2025-08-11 21:37:01 +08:00
Karl Seguin
f65a39a3e3 Re-enable telemetry
Start work on supporting navigation events (clicks, form submission).
2025-08-11 21:37:00 +08:00
Karl Seguin
94e8964f69 add custom scheduler 2025-08-11 21:37:00 +08:00
Karl Seguin
254d22e2cc don't poll libcurl if we have no running transfers 2025-08-11 21:37:00 +08:00
Karl Seguin
54ab1326e5 Switch XHR to new http client
get puppeteer/cdp.js working again

make test are all passing
2025-08-11 21:37:00 +08:00
Karl Seguin
b0fe5d60ab Initial work on integrating libcurl and making all http nonblocking 2025-08-11 21:36:56 +08:00
sjorsdonkers
6ebd4fcf5b fix unencrypted keepalive 2025-07-21 14:28:53 +02:00
sjorsdonkers
4ab611de0c minor refactor prep for tls 2025-07-21 09:30:22 +02:00
Karl Seguin
4f8a3fe5b9 Always make sure we have 1 free http state available for synchronous requests
If it wasn't for the fact that the HTTP client is likely going to see a major
refactor, it would definitely be time to create a specific state instance for
synchronous requests.
2025-07-14 16:41:26 +08:00
Karl Seguin
b387fd2bd4 Update src/http/client.zig
Co-authored-by: Sjors <72333389+sjorsdonkers@users.noreply.github.com>
2025-07-11 17:38:31 +08:00
Karl Seguin
795c925ba1 Revert "Update src/http/client.zig"
This reverts commit 4a12d045e4.
2025-07-11 09:49:40 +08:00
Karl Seguin
4a12d045e4 Update src/http/client.zig
Co-authored-by: Sjors <72333389+sjorsdonkers@users.noreply.github.com>
2025-07-10 17:10:58 +08:00
Karl Seguin
3049bb0b9f Fix async https requests over a http forward proxy
XHR requests to https (which is most XHR requests) currently don't work with
the implementation proxy because of this.
2025-07-10 16:27:09 +08:00
Karl Seguin
38bbad6e88 Revert "fix secure connection logic"
This reverts commit b6132f2497.
2025-07-08 09:33:53 +08:00
Karl Seguin
b6132f2497 fix secure connection logic 2025-07-07 19:56:21 +08:00
Karl Seguin
74a299eef7 Fix non-tls forward-proxy 2025-07-07 11:03:04 +08:00
sjorsdonkers
22a644ba01 rename tls_in_tls to tlsproxy 2025-07-04 10:00:22 +02:00
sjorsdonkers
bab120a75d secure changes 2025-07-04 10:00:22 +02:00
sjorsdonkers
e881d2f6cf tls proxy tweaks 2025-07-04 10:00:22 +02:00
Francis Bouvier
e2cc404571 Handle TLS proxy, both for HTTP and HTTPS (tls in tls) endpoints 2025-07-04 10:00:22 +02:00
sjorsdonkers
be71eaae47 TLS connect proxy WIP 2025-07-04 10:00:22 +02:00
Karl Seguin
455ed79872 Remove HTTP client generic Loop parameter
Some checks failed
e2e-test / zig build release (push) Has been cancelled
e2e-test / demo-scripts (push) Has been cancelled
e2e-test / cdp-and-hyperfine-bench (push) Has been cancelled
e2e-test / perf-fmt (push) Has been cancelled
zig-test / zig build dev (push) Has been cancelled
zig-test / browser fetch (push) Has been cancelled
zig-test / zig test (push) Has been cancelled
zig-test / perf-fmt (push) Has been cancelled
I think we initially thought we might need different clients for different
parts of the system, each with a unique loop  (e.g. we thought telemetry might
need some isolation). But that never happened, so it's just needless now,
especially since the async connect uses the non-generic *Loop type directly.
2025-07-03 15:10:47 +02:00
Karl Seguin
41b7ed6938 Upgrade tlz.zig to latest version
Was seeing pretty frequent TLS errors on reddit. I think I had the wrong max
TLS record size, but figured this was an opportunity to upgrade tls.zig, which
has seen quite a few changes since our last upgrade.

Specifically, the nonblocking TLS logic has been split into two structs: one
for handshaking, and then another to be used to encrypt/decrypt after the h
andshake is complete. The biggest impact here is with respect to keepalive,
since what we want to keepalive is the connection post-handshake, but we don't
have this object until much later.

There was also some general API changes, with respect to state and partially
encrypted/decrypted data which we must now maintain.
2025-06-27 13:14:12 +08:00
Pierre Tachoire
03e3f95d2e Merge pull request #810 from lightpanda-io/proxy-authentication
basic/bearer proxy authentication
2025-06-25 17:31:47 -07:00
sjorsdonkers
aea34264a9 basic/bearer testing 2025-06-25 12:04:38 +02:00
Karl Seguin
1e7ee4e0a1 proxy_type 'simple' renamed to 'forward' 2025-06-25 12:21:44 +08:00
sjorsdonkers
4560f31010 basic/bearer proxy authentication 2025-06-24 16:38:58 +02:00
Karl Seguin
c97a32e24b Initial work on CONNECT proxy.
Cannot currently connect to the proxy over TLS (though, once connected, it can
connect to the actual site over TLS). No support for authentication.
2025-06-24 15:10:20 +08:00
Pierre Tachoire
20aabee72e http: send an Accept: */* header 2025-06-23 18:18:04 -07:00
Karl Seguin
a6ac7d9c4e Delay setting the requests' keepalive flag until the request is fully processed
We currently set request._keepalive prematurely. There are [error cases] where
the request could be abandoned before being fully drained. While we do try to
drain in some cases, it isn't always possible. For this reason,
request.keepalive is only set at the end of the request lifecycle, at which
point we know the connection is ready to be re-used.
2025-06-17 19:55:36 +08:00
Karl Seguin
c28d87d59c Improve build and test speed
Test speed has been improved only slightly by tweaking a 2-second running tests.

Build has been improved by:
1 - moving logFunctionCallError out of js.Caller and to a standalone function
2 - removing some non-generic code from the generic portions of the logger

Caller.getter and Caller.setter have been removed in favor or calling
Caller.method. This wasn't previously possible - prior to our v8 upgrade, they
had different signatures.

Also removed a largely unused parser/str.zig file.
2025-06-16 19:50:13 +08:00
Karl Seguin
97c769e805 Rework internal navigation to prevent deadlocking
The mix of sync and async HTTP requests requires care to avoid deadlocks.
Previously, it was possible for async requests to use up all available HTTP
state objects duration a navigation flow (either directly, or via an internal
redirect (e.g. click, submit, ...)). This would block the navigation, which,
because everything is single thread, would block the I/O loop, resulting in a
deadlock.

The correct solution seems to be to remove all synchronous I/O. And I tried to
do that, but I ran into a wall with module-loading, which is initiated from V8.
V8 says "give me the source for this module", and I don't see a great way to
tell it: wait a bit.

So I went back to trying to make this work with the hybrid model, despite last
weeks failures to get it to work. I changed two things:

1 - The http client will only directly initiate an async request if there's
    at least 2 free state objects available (1 for the request, and leaving 1
    free for any synchronous requests)

2 - Delayed navigation retries until there's at least 1 free http state object
    available.

Commits from last week did help with this. First, we're now guaranteed to have
a single sync-request at a time (previously, we could have had 2). Secondly,
the async connection is now async end-to-end (previously, it could have blocked
on an empty state pool).

We could probably make this a bit more obviously by reserving 1 state object
for synchronous requests. But, since the long term solution is probably having
no synchronous requests, I'm happy with anything that lets me move past this
issue.
2025-06-12 12:34:51 +08:00
Karl Seguin
2b48902f1b Emit http_request_fail notification
CDP translate this into a Network.loadingFailed. This is necessary to make sure
every Network.requestWillBeSent is paired with either a Network.loadingFailed
or a Network.responseReceived.
2025-06-06 19:15:47 +08:00
Karl Seguin
305460dedb Merge pull request #768 from lightpanda-io/setExtraHTTPHeaders
Some checks failed
e2e-test / zig build release (push) Has been cancelled
e2e-test / puppeteer-perf (push) Has been cancelled
e2e-test / demo-scripts (push) Has been cancelled
e2e-test / cdp-and-hyperfine-bench (push) Has been cancelled
e2e-test / perf-fmt (push) Has been cancelled
zig-test / zig build dev (push) Has been cancelled
zig-test / browser fetch (push) Has been cancelled
zig-test / zig test (push) Has been cancelled
zig-test / perf-fmt (push) Has been cancelled
setExtraHTTPHeaders
2025-06-06 16:45:07 +08:00
Karl Seguin
a5d87ab948 Reduce duration of the main request
We currently keep the main request open during loadHTMLDoc and processHTMLDoc.
It _has_ to be open during loadHTMLDoc, since that streams the body. But it
does not have to be open during processHTMLDoc, which can be log and itself
could make use of that same connection if it was released. Reorganized the
navigate flow to limit the scope of the request.

Also, just like we track pending_write and pending_read, we now also track
pending_connect and only shutdown when all are not pending.
2025-06-05 23:41:21 +08:00
sjorsdonkers
f1672dd6d2 setExtraHTTPHeaders 2025-06-05 16:42:29 +02:00
Karl Seguin
48c25c380d Removing blocking code async HTTP request
The HTTP Client has a state pool. It blocks when we've exceeded max_concurrency.
This can block processing forever. A simple way to reproduce this is to go into
the demo cdp.js, and execute the XHR request 5 times (loading json/product.json)

To some degree, I think this is a result of weird / non-intuitive execution
flow. If you exec a JS with 100 XHR requests, it'll call our XHR _send function
but none of these will execute until the loop is run (after the script is done
being executed). This can result in poor utilization of our connection and
state pool.

For an async request, getting the *Request object is itself now asynchronous.
If no state is available, we use the Loop's timeout (at 20ms) to keep checking
for an available state.
2025-06-05 20:52:37 +08:00
Karl Seguin
3a5aa87853 Optimize the lifecycle of async requests
Async HTTP request work by emitting a "Progress" object to a callback. This
object has a "done" flag which, when `true`, indicates that all data has been
emitting and no future "Progress" objects will be sent.

Callers like XHR buffer the response and wait for "done = true" to then process
the request.

The HTTP client relies on two important object pools: the connection and the
state (with all the buffers for reading/writing).

In its current implementation, the async flow does not release these pooled
objects until the final callback has returned. At best, this is inefficient:
we're keeping the connection and state objects checked out for longer than they
have to be. At worse, it can lead to a deadlock. If the calling code issues a
new request when done == true, we'll eventually run out of state objects in the
pool.

This commit now releases the state objects before emit the final "done" Progress
message. For this to work, this final message will always have null data and
an empty header object.
2025-06-05 20:52:37 +08:00
Karl Seguin
c3f3eea7fb Improve logging
1 - Make log_level a runtime option (not a build-time)
2 - Make log_format a runtime option
3 - In Debug mode, allow for log scope filtering

Improve the general usability of scopes. Previously, the scope was more or less
based on the file that the log was in. Now they are more logically grouped.
Consider the case where you want to silence HTTP request information, previously
you'd have to filter out the `page`, `xhr` and `http_client` scopes, but that
would also elimiate other page, xhr and http_client logs. Now, you can just
filter out the `http` scope.
2025-06-02 21:38:56 +08:00
Karl Seguin
e339ee3f0c Clean Http Request Shutdown
The Request object now exists on the heap, allowing it to outlive whatever is
making the request (e.g. the XHR object). We can now wait until all inflight IO
events are completed before clearing the memory.

This change fixes the crash observed in:
https://github.com/lightpanda-io/browser/issues/667
2025-05-31 07:22:01 +08:00