WebMCP Proposal

2026-02-1617:0815381webmachinelearning.github.io

WebMCP API is a new JavaScript interface that allows web developers to expose their web application functionality as “tools” - JavaScript functions with natural language descriptions and structured…

WebMCP API is a new JavaScript interface that allows web developers to expose their web application functionality as “tools” - JavaScript functions with natural language descriptions and structured schemas that can be invoked by agents, browser’s agents, and assistive technologies. Web pages that use WebMCP can be thought of as Model Context Protocol [MCP] servers that implement tools in client-side script instead of on the backend. WebMCP enables collaborative workflows where users and agents work together within the same web interface, leveraging existing application logic while maintaining shared context and user control.

An agent is an autonomous assistant that can understand a user’s goals and take actions on the user’s behalf to achieve them. Today, these are typically implemented by large language model (LLM) based AI platforms, interacting with users via text-based chat interfaces.

A browser’s agent is an agent provided by or through the browser that could be built directly into the browser or hosted by it, for example, via an extension or plug-in.

An AI platform is a provider of agentic assistants such as OpenAI’s ChatGPT, Anthropic’s Claude, or Google’s Gemini.

A model context is a struct with the following items:

tool map

a map whose keys are strings and whose values are tool definition structs.

A tool definition is a struct with the following items:

name

a string uniquely identifying a tool registered within a model context’s tool map; it is the same as the key identifying this object.

description

a string.

input schema

a string.

Note: For tools registered by the imperative form of this API (i.e., registerTool()), this is the stringified representation of inputSchema. For tools registered declaratively, this will be a stringified JSON Schema object created by the synthesize a declarative JSON Schema object algorithm. [JSON-SCHEMA]

execute steps

a set of steps to invoke the tool.

Note: For tools registered imperatively, these steps will simply invoke the supplied ToolExecuteCallback callback. For tools registered declaratively, this will be a set of "internal" steps that have not been defined yet, that describe how to fill out a form and its form-associated elements.

read-only hint

a boolean, initially false.

The Navigator interface is extended to provide access to the ModelContext.

partial interface Navigator { [SecureContext, SameObject] readonly attribute ModelContext modelContext;
};

Each Navigator object has an associated , which is a ModelContext instance created alongside the Navigator.

The getter steps are to return this’s modelContext.

4.2.

ModelContext Interface

The ModelContext interface provides methods for web applications to register and manage tools that can be invoked by agents.

[Exposed=Window, SecureContext]
interface ModelContext { undefined registerTool(ModelContextTool tool); undefined unregisterTool(DOMString name);
};

Each ModelContext object has an associated internal context, which is a model context struct created alongside the ModelContext.

navigator.modelContext.registerTool(tool)

Registers a single tool without clearing the existing set of tools. The method throws an error, if a tool with the same name already exists, or if the inputSchema is invalid.

navigator.modelContext.unregisterTool(name)

Removes the tool with the specified name from the registered set.

The registerTool(tool) method steps are:

  1. Let tool map be this’s internal context’s tool map.

  2. Let tool name be tool’s name.

  3. If tool map[tool name] exists, then throw an InvalidStateError DOMException.

  4. If either tool name or description is the empty string, then throw an InvalidStateError DOMException.

  5. Let stringified input schema be the empty string.

  6. If tool’s inputSchema exists, then set stringified input schema to the result of serializing a JavaScript value to a JSON string, given tool’s inputSchema.

    The serialization algorithm above throws exceptions in the following cases:

    1. Throws a new TypeError when the backing "JSON.stringify()" yields undefined, e.g., "inputSchema: { toJSON() {return HTMLDivElement;}}", or "inputSchema: { toJSON() {return undefined;}}".

    2. Re-throws exceptions thrown by "JSON.stringify()", e.g., when "inputSchema" is an object with a circular reference, etc.

  7. Let read-only hint be true if tool’s annotations exists and its readOnlyHint is true. Otherwise, let it be false.

  8. Let tool definition be a new tool definition, with the following items:

    name

    tool name

    description

    tool’s description

    input schema

    stringified input schema

    execute steps

    steps that invoke tool’s execute

    read-only hint

    read-only hint

  9. Set this’s internal context[tool name] to tool definition.

The unregisterTool(name) method steps are:

The ModelContextTool dictionary describes a tool that can be invoked by agents.

dictionary ModelContextTool { required DOMString name; required DOMString description; object inputSchema; required ToolExecuteCallback execute; ToolAnnotations annotations;
}; dictionary ToolAnnotations { boolean readOnlyHint = false;
}; callback ToolExecuteCallback = Promise<any> (object input, ModelContextClient client);
tool["name"]

A unique identifier for the tool. This is used by agents to reference the tool when making tool calls.

tool["description"]

A natural language description of the tool’s functionality. This helps agents understand when and how to use the tool.

tool["inputSchema"]

A JSON Schema [JSON-SCHEMA] object describing the expected input parameters for the tool.

tool["execute"]

A callback function that is invoked when an agent calls the tool. The function receives the input parameters and a ModelContextClient object.

The function can be asynchronous and return a promise, in which case the agent will receive the result once the promise is resolved.

tool["annotations"]

Optional annotations providing additional metadata about the tool’s behavior.

The ToolAnnotations dictionary provides optional metadata about a tool:

readOnlyHint,

of type boolean, defaulting to false

If true, indicates that the tool does not modify any state and only reads data. This hint can help agents make decisions about when it is safe to call the tool.

4.2.2.

ModelContextClient Interface

The ModelContextClient interface represents an agent executing a tool provided by the site through the ModelContext API.

[Exposed=Window, SecureContext]
interface ModelContextClient { Promise<any> requestUserInteraction(UserInteractionCallback callback);
}; callback UserInteractionCallback = Promise<any> ();
client.requestUserInteraction(callback)

Asynchronously requests user input during the execution of a tool.

The callback function is invoked to perform the user interaction (e.g., showing a confirmation dialog), and the promise resolves with the result of the callback.

The requestUserInteraction(callback) method steps are:

This section is entirely a TODO. For now, refer to the explainer draft.

The synthesize a declarative JSON Schema object algorithm, given a form element form, runs the following steps. They return a map representing a JSON Schema object. [JSON-SCHEMA]

Thanks to Brandon Walderman, Leo Lee, Andrew Nolan, David Bokan, Khushal Sagar, Hannah Van Opstal, Sushanth Rajasankar for the initial explainer, proposals and discussions that established the foundation for this specification.

Also many thanks to Alex Nahas and Jason McGhee for sharing early implementation experience.

Finally, thanks to the participants of the Web Machine Learning Community Group for feedback and suggestions.


Read the original article

Comments

  • By nozzlegear 2026-02-1622:033 reply

    The fact that the "Security and privacy considerations" and the "Accessibility considerations" sections are completely blank in this proposal is delightful meta commentary on the state of the AI hype cycle. I know it's just a draft so far, but it got a laugh out of me.

    • By notepad0x90 2026-02-1623:221 reply

      I'm struggling to think of a good entry under those sections, what did you have in mind?

      For accessibility, that's a client consideration typically, the agent using the MCP server would be responsible for making its output accessible. I don't think the intention is to let webapps define how their output is displayed to end users, but to define outputs for agents instead.

      For security, other than what the MCP protocol itself provides, what should be defined?

      I think it's a draft, there is still discussion about it, they might not have reached a point where there consensus for those categories. But I'm curious to hear your thoughts.

      • By dfabulich 2026-02-177:091 reply

        > For security, other than what the MCP protocol itself provides, what should be defined?

        The MCP protocol itself provides no security at all.

        The MCP specification includes no specified method of authorization, and no specified security rules. It lists a handful of "principles," and then the specification simply gives up on discussing the problem further.

        https://modelcontextprotocol.io/specification/2025-11-25#sec...

            3.2 Implementation Guidelines
        
            While MCP itself cannot enforce these security principles at the protocol
            level, implementors **SHOULD**:
        
            1. Build robust consent and authorization flows into their applications
            2. Provide clear documentation of security implications
            3. Implement appropriate access controls and data protections
            4. Follow security best practices in their integrations
            5. Consider privacy implications in their feature designs

        • By notepad0x90 2026-02-1718:201 reply

          it's just an http or stdio server, would there be considerations beyond that of any other http server or cli app? shouldn't the security be dependent on deployment details? Like you wouldn't require OAUTH if it is deployed on localhost only, or if there is a reverse proxy handling that bit.

          There is a reason it cannot enforce those principles, an MCP is a web service. it could use SQL as a backend for some reason, or use static pages. it might be best to use mTLS, or it might make sense to make it open to the public with no authentication or authorization whatsoever, and your only concern might be availability (429 thresholds). the spec can't and shouldn't account for wildly varying implementation possibilities right?

          • By davidcrowe 2026-02-1721:54

            The difference is that MCP introduces a third party: the agent isn't the user and isn't the service, but it's acting on behalf of one to call the other. Standard HTTP auth assumes two parties. That's the gap the spec needs to address.

    • By ryanmcbride 2026-02-1622:43

      don't worry in a few weeks they'll have AI generate some policies for them to skim!

    • By ohyoutravel 2026-02-1622:58

      This stuck out to me. What a joke.

  • By gavmor 2026-02-1619:022 reply

    This seems backwards, somehow. Like you're asking for an nth view and an nth API, and services are being asked to provide accessibility bridges redundant with our extant offerings.

    Sites are now expected duplicate effort by manually defining schemas for the same actions — like re-describing a button's purpose in JSON when it's already semantically marked up?

    • By foota 2026-02-1619:111 reply

      No, I don't think you're thinking about this right. It's more like hacker news would expose an MCP when you visit it that would present an alternative and parallel interface to the page, not "click button" tools.

      • By cush 2026-02-1620:131 reply

        You're both right. The page can expose MCP tools like via a form element which is as simple as adding an attribute to an existing form and completely aligns with existing semantic HTML - eg submitting an HN "comment". Additionally, the page can define additional tools in javascript that aren't in forms - eg YouTube could provide a transcript MCP defined in JS which fetches the video's transcript

        https://developer.chrome.com/blog/webmcp-epp

        • By znpy 2026-02-1622:091 reply

          I think that rest and html could probably be already used for this purpose BUT html is often littered with elements used for visual structure rather than semantics.

          In an ideal world html documents should be very simple and everything visual should be done via css, with JavaScript being completely optional.

          In such a world agents wouldn’t really need a dedicated protocol (and websites would be much faster to load and render, besides being much lighter on cpu and battery)

          • By cush 2026-02-177:56

            > html could probably be already used for this purpose

            You’re right, and it already is, and tools like playwright MCP can easily parse a webpage to use it and get things done with existing markup today.

            > BUT html is often littered with elements used for visual structure rather than semantics.

            This actually doesn’t make much of a difference to a tool like playwright because it uses a snapshot of the accessibility tree, which only looks at semantic markup, ignoring any presentation

            > In such a world agents wouldn’t really need a dedicated protocol

            They still do though, because they can work more better when given specific tools. WebMCP could provide tools not available on the page. Like an agent hits the dominoes.com landing page. The page could provide an order_pizza tool that the agent could interact with, saving a bunch of navigation, clicks and scrolling and whatnot. It calls the order_pizza tool with “Two large pepperoni pizzas for John at <address>”, and the whole process is done.

    • By jauntywundrkind 2026-02-1622:31

      I see two totally different things from where we are today

      1. This is a contextual API built into each page. Historically site's can offer an API, but that API a parallel experience, a separate machine-to-machine channel, that doesn't augment or extend the actual user session. The MCP API offered here is one offered by the page (not the server/site), in a fully dynamic manner (what's offered can reflect what the state of the page is), that layers atop user session. That's totally different.

      2. This opens an expectation that sites have a standard means of control available. This has two subparts:

      2a. There's dozens of different API systems available, to pick from, to expose your site. Github got half way from rest to graphql then turned back. Some sites use ttrpc or capnweb or gproto. There hasn't actually been one accepted way for machines to talk to your site, there's been a fractal maze of offerings on the web. This is one consistent offering mirroring what everyone is already using now anyways.

      2b. Offering APIs for your site has gone out of favor in general. It often has had high walls and barriers when it is available. But now the people putting their fingers in that leaky damn are patently clearly Not Going To Make It, the LLM's will script & control the browser if they have to, and it's much much less pain to just lean in to what users want to do, and to expose a good WebMCP API that your users can enjoy to be effective & get shit done, like they have wanted to do all along. If webmcp takes off at all, it will reset expectations, that the internet is for end users, and that their agency & their ability to work your site as they please via their preferred modalities is king. WebMCP directs us towards a rfc8890 complaint future, by directly enabling site agency. https://datatracker.ietf.org/doc/html/rfc8890

  • By cadamsdotcom 2026-02-1618:175 reply

    Great to see people thinking about this. But it feels like a step on the road to something simpler.

    For example, web accessibility has potential as a starting point for making actions automatable, with the advantage that the automatable things are visible to humans, so are less likely to drift / break over time.

    Any work happening in that space?

    • By jayd16 2026-02-1618:361 reply

      In theory you could use a protocol like this, one where the tools are specified in the page, to build a human readable but structured dashboard of functionality.

      I'm not sure if this is really all that much better than, say, a swagger API. The js interface has the double edge of access to your cookies and such.

    • By egeozcan 2026-02-1618:341 reply

      As someone heavily involved in a11y testing and improvement, the status quo, for better or worse, is to do it the other way around. Most people use automated, LLM based tooling with Playwright to improve accessibility.

      • By cadamsdotcom 2026-02-1618:53

        I certainly do - it’s wonderful that making your site accessible is a single prompt away!

    • By jauntywundrkind 2026-02-1622:35

      Chris Shank & Orion Reed doing some very nice work with accessibility trees. https://bsky.app/profile/chrisshank.com/post/3m3q23xpzkc2u

      I tried to play along at home some, play with rust accesskit crate. But man I just could not get Orcas or other basic tools to run, could not get a starting point. Highly discouraging. I thought for sure my browser would expose accessibility trees I could just look at & tweak! But I don't even know if that's true or not yet! Very sad personal experience with this.

    • By bavandersloot 2026-02-170:13

      There is a proposed extension in the repo that is getting some traction that automatically converts forms into tools. There is trouble in linking this to a11y though, since that could lead to incentivize sites to make really bad decisions for human consumers of those surfaces.

    • By thevinter 2026-02-1619:12

      We're building an app that automatically generates machine/human readable JSON by parsing semantic HTML tags and then by using a reverse proxy we serve those instead of HTML to agents

HackerNews