Rather than stack-allocating MAX_MESSAGE_SIZE upfront, we now allocate 32KB
and grow the buffer as needed for larger messages, up to MAX_MESSAGE_SIZE.
This will reduce memory usage for drivers that don't send huge payloads (like
playwright does).
While not implemented, this would also enable us to set the MAX_MESSAGE_SIZE
at runtime (e.g. via a command line option).
Previously, the IO loop was doing three things:
1 - Managing timeouts (either from scripts or for our own needs)
2 - Handling browser IO events (page/script/xhr)
3 - Handling CDP events (accept, read, write, timeout)
With the libcurl merge, 1 was moved to an in-process scheduler and 2 was moved
to libcurl's own event loop. That means the entire loop code, including
the dependency on tigerbeetle-io existed for handling a single TCP client.
Not only is that a lot of code, there was also friction between the two loops
(the libcurl one and our IO loop), which would result in latency - while one
loop is waiting for the events, any events on the other loop go un-processed.
This PR removes our IO loop. To accomplish this:
1 - The main accept loop is blocking. This is simpler and works perfectly well,
given we only allow 1 active connection.
2 - The client socket is passed to libcurl - yes, libcurl's loop can take
arbitrary FDs and poll them along with its own.
In addition to having one less dependency, the CDP code is quite a bit simpler,
especially around shutdowns and writes. This also removes _some_ of the latency
caused by the friction between page process and CDP processing. Specifically,
when CDP now blocks for input, http page events (script loading, xhr, ...) will
still be processed.
There's still friction. For one, the reverse isn't true: when the page is
waiting for events, CDP events aren't going to be processed. But the page.wait
already have some sensitivity to this (e.g. the page.request_intercepted flag).
Also, when CDP waits, while we will process network events, page timeouts are
still not processed. Because of both these remaining issues, we still need to
jump between the two loops - but being able to block on CDP (even for a short
time) WITHOUT stopping the page's network I/O, should reduce some latency.
1 - Make log_level a runtime option (not a build-time)
2 - Make log_format a runtime option
3 - In Debug mode, allow for log scope filtering
Improve the general usability of scopes. Previously, the scope was more or less
based on the file that the log was in. Now they are more logically grouped.
Consider the case where you want to silence HTTP request information, previously
you'd have to filter out the `page`, `xhr` and `http_client` scopes, but that
would also elimiate other page, xhr and http_client logs. Now, you can just
filter out the `http` scope.
Outputs in logfmt in release and a "pretty" print in debug mode. The format
along with the log level will become arguments to the binary at some point in
the future.
When the browser microtask was added, zig-specific timeout functions were
added to the loop. This was necessary for two reasons:
1 - The existing functions were JS specific
2 - We wanted a different reset counter for JS and Zig
Like we did in https://github.com/lightpanda-io/browser/pull/577, the loop is
now JS-agnostic. It gets a Zig callback, and the Zig callback can execute JS
(or do whatever). An intrusive node, like with events, is used to minimize
allocations.
Also, because the microtask was recently moved to the page, there is no longer
a need for separate event counters. All timeouts are scoped to the page.
The new timeout callback can now be used to efficiently reschedule a task. This
reuses the IO.completion and Context, avoiding 2 allocations. More importantly
it makes the internal timer_id static for the lifetime of an "interval". This
is important for window.setInterval, where the callback can itself clear the
interval, which we would need to detect in the callback handler to avoid
re-scheduling. With the stable timer_id, the existing cancel mechanism works
as expected.
The loop no longer has a cbk_error. Callback code is expected to try/catch
callbacks (or use callback.tryCall) and handle errors accordingly.
Previously, we could have multiple in-flight messages from the server to a
single client. This isn't safe and can lead to message interleaving. While
write / send are atomic, they are only atomic for the N bytes which they write,
which may not be the entire buffer. Consider this writeAll function:
```
pub fn writeAll(socket: socket_t, bytes: []const u8) !void {
var index: usize = 0;
while (index < bytes.len) {
index += try posix.write(socket, bytes[index..]);
}
}
```
If we're trying to send "abc123", this could take anywhere from 1 to 6 calls
to posix.write (it would take 6 calls, for example, if every call to
posix.write only wrote a single byte). Now if you're trying to write other data
to this same socket at the same time, messages _will_ get interleaved.
In order for this to work, the client now has a send_queue (doubly linked list).
When one message is sent, it sends the next.
In addition to the above change, the Client is now self-contained with respect
to its lifetime. This is necessary so that completions which come in AFTER our
concept of its lifetime ends, can still be processed. I think all types that
receive completions need to follow this model. This relies on the fact that
kqueue (which I know for a fact) and io_uring (which people seem to imply) handle
socket shutdown properly. It's still a bit messy because of timeout and not
wanting to wait until timeout to accept new connections, but needing to wait
until timeout to cleanup the client.
The self-contained nature of Client makes it difficult to test as a generic. I
removed Client(T). Tests now use real sockets. Some tests had to be removed
because they're too difficult to test over a real connection :(
ADD CDP testing helpers (mock Browser, Session, Page and Client). These are
placeholders until tests are added which use them.
Added a couple CDP tests.
For the time being, given that we only allow 1 client at a time, I took a
shortcut to implement this. The server has an incrementing "current_client_id"
which is part of every completion. On completion callback, we just check if
its client_id is still equal to the server's current_client_id.
Move more logic into the reader. Avoid copying partial messages in
cases where we know that the buffer is large enough.
This is mostly groundwork for trying to add support for continuation
frames.
Adding HTTP & websocket awareness to the TCP server.
HTTP server handles `GET /json/version` and websocket upgrade requests.
Conceptually, websocket handling is the same code as before, but receiving
data will parse the websocket frames and writing data will wrap it in
a websocket frame.
The previous `Ctx` was split into a `Server` and a `Client`. This was
largely done to make it easy to write unit tests, since the `Client` is
a generic, all its dependencies (i.e. the server) can be mocked out. This
also makes it a bit nicer to know if there is or isn't a client (via the
server's client optional).
Added a MemoryPool for the Send object (I thought that was a nice touch!)
Removed MacOS hack on accept/conn completion usage.
Known issues:
- When framing an outgoing message, the entire message has to be duped. This
is no worse than how it was before, but it should be possible to eliminate
this in the future. Probably not part of this PR.
- Websocket parsing will reject continuation frames. I don't know of a single
client that will send a fragmented message (websocket has its own
message fragmentation), but we should probably still support this just in
case.
- I don't think the receive, timeout and close completions can safely be
re-used like we're doing. I believe they need to be associated with a specific
client socket.
- A new connection creates a new browser session. I think this is right (??),
but for the very first, we're throwing out a perfectly usable session. I'm
thinking this might be a change to how Browser/Sessions work.
- zig build test won't compile. This branch reproduces the issue with none
of these changes:
https://github.com/karlseguin/browser/tree/broken_test_build
(or, as a diff to main):
https://github.com/lightpanda-io/browser/compare/main...karlseguin:broken_test_build
CDP is now an struct which contains its own state a browser and a session.
When a client connection is made and successfully upgrades, the client creates
the CDP instance. There is now a cleaner separation betwen Server, Client and
CDP.
Removed a number of allocations, especially when writing results/events from
CDP to the client. Improved input message parsing. Tried to remove some usage
of undefined.
Move more logic into the reader. Avoid copying partial messages in
cases where we know that the buffer is large enough.
This is mostly groundwork for trying to add support for continuation
frames.
Adding HTTP & websocket awareness to the TCP server.
HTTP server handles `GET /json/version` and websocket upgrade requests.
Conceptually, websocket handling is the same code as before, but receiving
data will parse the websocket frames and writing data will wrap it in
a websocket frame.
The previous `Ctx` was split into a `Server` and a `Client`. This was
largely done to make it easy to write unit tests, since the `Client` is
a generic, all its dependencies (i.e. the server) can be mocked out. This
also makes it a bit nicer to know if there is or isn't a client (via the
server's client optional).
Added a MemoryPool for the Send object (I thought that was a nice touch!)
Removed MacOS hack on accept/conn completion usage.
Known issues:
- When framing an outgoing message, the entire message has to be duped. This
is no worse than how it was before, but it should be possible to eliminate
this in the future. Probably not part of this PR.
- Websocket parsing will reject continuation frames. I don't know of a single
client that will send a fragmented message (websocket has its own
message fragmentation), but we should probably still support this just in
case.
- I don't think the receive, timeout and close completions can safely be
re-used like we're doing. I believe they need to be associated with a specific
client socket.
- A new connection creates a new browser session. I think this is right (??),
but for the very first, we're throwing out a perfectly usable session. I'm
thinking this might be a change to how Browser/Sessions work.
- zig build test won't compile. This branch reproduces the issue with none
of these changes:
https://github.com/karlseguin/browser/tree/broken_test_build
(or, as a diff to main):
https://github.com/lightpanda-io/browser/compare/main...karlseguin:broken_test_build