WebDriver — CUCH

WebDriver
WebDriver
W3C Working Draft
01 April 2026
More details about this document
This version:
Latest published version:
Latest editor's draft:
History:
Commit history
Test suite:
Implementation report:
Editors:
Simon Stewart
Apple
David Burns
BrowserStack
Feedback:
GitHub w3c/webdriver
pull requests
new issue
open issues
Channel
#webdriver on irc.w3.org
2026
World Wide Web Consortium
W3C
liability
trademark
and
permissive document license
rules apply.
Abstract
WebDriver is a remote control interface
that enables introspection and control of user agents.
It provides a platform- and language-neutral wire protocol
as a way for out-of-process programs
to remotely instruct the behavior of web browsers.
Provided is a set of interfaces
to discover and manipulate DOM elements in web documents
and to control the behavior of a user agent.
It is primarily intended to allow web authors to write tests
that automate a user agent from a separate controlling process,
but may also be used in such a way as to allow in-browser scripts
to control a — possibly separate — browser.
Status of This Document
This section describes the status of this
document at the time of its publication. A list of current
W3C
publications and the latest revision of this technical report can be found
in the
W3C
standards and drafts index
This document was published by the
Browser Testing and Tools Working Group
as
a Working Draft using the
Recommendation track
Publication as a Working Draft does not
imply endorsement by
W3C
and its Members.
This is a draft document and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to cite this document as other
than a work in progress.
This document was produced by a group
operating under the
W3C
Patent
Policy
W3C
maintains a
public list of any patent disclosures
made in connection with the deliverables of
the group; that page also includes
instructions for disclosing a patent. An individual who has actual
knowledge of a patent that the individual believes contains
Essential Claim(s)
must disclose the information in accordance with
section 6 of the
W3C
Patent Policy
This document is governed by the
18 August 2025
W3C
Process Document
1.
Design
This section is non-normative.
The WebDriver standard attempts to follow a number of design goals:
1.1
Compatibility
This specification is derived from the popular
Selenium WebDriver
browser automation framework.
Selenium is a long-lived project,
and due to its age and breadth of use
it has a wide range of expected functionality.
This specification uses these expectations to inform its design.
Where improvements or clarifications have been made,
they have been made with care to allow existing users of Selenium WebDriver
to avoid unexpected breakages.
1.2
Simplicity
The largest intended group of users of this specification
are software developers and testers
writing automated tests and other tooling,
such as monitoring or load testing, that relies on automating a browser.
As such, care has been taken to provide commands
that simplify common tasks such as
typing into
and
clicking
elements.
1.3
Extensions
WebDriver provides a mechanism for others to define extensions to the protocol
for the purposes of automating functionality that cannot be implemented entirely
in
ECMAScript
. This allows other
web standards to support the automation of new platform features. It also
allows vendors to expose functionality that is specific to their browser.
2.
Conformance
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
Conformance requirements phrased as algorithms
or specific steps may be implemented in any manner,
so long as the end result is equivalent.
Algorithms in this document are typically written with readability,
rather than performance, in mind.
3.
Terminology
In equations, all numbers are integers,
addition is represented by “+”,
subtraction by “−”,
division by “÷”,
and bitwise OR by “|”.
The characters “(” and “)” are used to provide logical grouping in these contexts.
The mathematical function
min
value
value
[,
value
])
returns the smallest item of two or more values.
Conversely, the function
max
value
value
[,
value
])
returns the largest item of two or more values.
The mathematical function
floor
value
produces the largest integer, closest to positive infinity,
that is not larger than
value
Universally Unique Identifier (UUID)
is a 128 bits long URN that requires no central registration process.
Generating a UUID
means
Creating a UUID From Truly Random or Pseudo-Random Numbers
and converting it to the string representation.
RFC4122
The
Unix Epoch
is a value that approximates the number of seconds
that have elapsed since the Epoch,
as described by The Open Group Base Specifications Issue 7
section 4.15
(IEEE Std 1003.1).
An
integer
is a
Number
that is unchanged
under the
ToInteger
operation.
The
initial value
of an ECMAScript property
is the value defined by the platform for that property,
i.e. the value it would have in the absence of any shadowing by content script.
The
browser chrome
is a non-normative term
to refer to the representation through which the user
interacts with the user agent itself,
as distinct from the accessed web content.
Examples of
browser chrome elements
include, but are not limited to,
toolbars (such as the bookmark toolbar),
menus (such as the file or context menu),
buttons (such as the back and forward buttons),
door hangers (such as security and certificate indicators),
and decorations (such as operating system widget borders).
MDN
Navigator/webdriver
This feature is in all major engines.
Chrome
63+
Chrome Android
Edge
12+
Edge Mobile
Firefox
60+
Firefox Android
Opera
Opera Android
Safari
10.1+
Safari iOS
Samsung Internet
WebView Android
4.
Interface
The
webdriver-active flag
is set to true when the user agent is under remote control.
It is initially false.
WebIDL
interface mixin
NavigatorAutomationInformation
readonly attribute
boolean
webdriver
};
Navigator
includes
NavigatorAutomationInformation
Note
The
NavigatorAutomationInformation
interface
should not be exposed on
WorkerNavigator
webdriver
Returns true if
webdriver-active flag
is set, false otherwise.
Example
For web authors (non-normative):
navigator
webdriver
Defines a standard way for co-operating user agents
to inform the document that it is controlled by WebDriver,
for example so that alternate code paths can be triggered during automation.
It is acknowledged that this is complementary to the Evil Bit [
RFC3514
].
5.
Nodes
The WebDriver protocol consists of communication between:
Local end
The local end represents the client side of the protocol,
which is usually in the form of language-specific libraries
providing an API on top of the WebDriver
protocol
This specification does not place any restrictions on the details of those libraries
above the level of the wire protocol.
Remote end
The remote end hosts the server side of
the
protocol
. Defining the behavior of
remote end
in response to the WebDriver protocol forms the
largest part of this specification.
For
remote ends
the standard defines two broad conformance
classes, known as
node types
Intermediary node
Intermediary nodes are those that act as proxies, implementing
both the
local end
and
remote end
of
the
protocol
. However they are not expected
to implement
remote end steps
directly. Nodes between
a specific
intermediary node
and an
endpoint node
are
said to be
upstream
of the
endpoint node
Endpoint node
An endpoint node is the final
remote end
in a chain of nodes that is not an
intermediary node
The endpoint node is implemented by a user agent or a similar program.
All remote end
node types
must be black-box indistinguishable
from a
remote end
, from the point of view of
local end
and so are bound by the requirements on a
remote end
in terms
of the wire protocol.
The
readiness state
of a
remote end
indicates
whether it is free to accept new connections. It must be false if the
implementation is an
endpoint node
and the list of
active
HTTP sessions
is not empty, or otherwise if the
remote end
is known to
be in a state in which attempting to create
new sessions
would
fail. In all other cases it must be true.
If the
intermediary node
is a multiplexer that
manages multiple
endpoint nodes
, this
might indicate its ability to purvey more
sessions
, for
example if it has hit its maximum capacity.
6.
Protocol
WebDriver
remote ends
must provide
an
HTTP compliant
wire protocol
where the
endpoints
map to different
commands
As this standard only defines the
remote end
protocol,
it puts no demands to how
local ends
should be implemented.
Local ends
are only expected to be compatible to the extent
that they can speak the
remote end
's protocol;
no requirements are made upon their exposed user-facing API.
6.1
Algorithms
Various parts of this specification are written in terms of step-by-step algorithms.
The details of these algorithms do not have any normative significance;
implementations are free to adopt any implementation strategy
that produces equivalent output to the specification.
In particular, algorithms in this document are optimized
for readability rather than performance.
Where algorithms that return values are fallible,
they are written in terms of returning either
success
or
error
success
value has an associated
data
field
which encapsulates the value returned,
whereas an
error
value has an associated
error code
When calling a fallible algorithm,
the construct “Let
result
be the result
of
trying
to call
algorithm
is equivalent to
Let
temp
be the result of calling
algorithm
If
temp
is an
error
return
temp
otherwise let
result
be
temp
's
data
field.
The result of
getting a property
with
name
from
object
is defined as being the same as the result of
calling
Object.[[GetOwnProperty]]
name
) on
object
The result of
getting a property with default
with
arguments
name
and
default
from
object
is defined as being the same as the result of
calling
Object.[[GetOwnProperty]]
name
on
object
if that results in a value other
than
undefined
and
default
otherwise.
Setting a property
with
arguments
name
and
value
on
object
is defined as being the same as calling
Object.[[Put]]
name
value
) on
object
The result of
JSON serialization
with
object
of type JSON
Object
is defined as the result of
calling
stringify
object
).
The result of
JSON deserialization
with
text
is defined as
the result of calling
parse
text
).
6.2
Commands
The WebDriver protocol is organized into
commands
Each
HTTP request
with a method and template defined in this specification
represents a single
command
and therefore each command produces a single
HTTP response
In response to a
command
remote end
will run a series of actions
known as
remote end steps
These provide the sequences of actions that a
remote end
takes
when it receives a particular
command
6.3
Processing model
The
remote end
is an HTTP server
reading requests from the client and writing responses,
typically over a TCP socket.
For the purposes of this specification we model the data transmission between
a particular
local end
and
remote end
with a
connection
to which the
remote end
may
write bytes
and
read bytes
However the exact details of how this
connection
works
and how it is established are out of scope.
After a
connection
is established, the
remote end
must run
the following steps:
While
the
connection
is not closed:
Read bytes
from the
connection
until a
complete
HTTP request
can be constructed from the data.
Let
request
be a
request
constructed from the
received data, according to the requirements of [
RFC7230
]. If it
is not possible to construct a complete
HTTP request
the
remote end
must either close the
connection
return an HTTP response with status code 500, or return
an
error
with
error code
unknown error
Let
request match
be the result of the algorithm
to
match a request
with
request
's
method
and
URL
as arguments.
If
request match
is of type
error
send an error
with
request match
's
error code
and
continue
Otherwise, let
command
and
URL variables
be
request match
's data.
Let
session
be null.
If
URL variables
contains
session id"
Note
This condition is intended to exclude the
New Session
and
Status
commands
and any
extension commands
which do not operate on a particular
session
Let
session id
be
URL variables
["
session id
"].
For each
active session
in the list of
active sessions
If
active session
's
session ID
is equal
to
session id
, then let
session
be
active session
, and break.
If the
session
is
null
send an error
with
error code
invalid session id
then
continue
Enqueue a task on
remote end
's
request queue
to run the following
steps:
If
session
is no longer in the list of
active
sessions
, then
send an error
with
error code
invalid session id
and return.
Let
parameters
be
null
If
request
's
method
is POST:
Let
parse result
be the result of
parsing as JSON
with
request
's
body
as the argument. If this process throws an exception,
return an
error
with
error code
invalid
argument
and jump back to step 1 in this overall algorithm.
If
parse result
is not an
Object
send an error
with
error code
invalid argument
and jump back to step 1 in this overall algorithm.
Otherwise, let
parameters
be
parse result
Let
navigate result
be the result
of
wait for navigation to complete
with
session
If
navigate result
is an
error
send an error
with
error code
equal to
navigate result
's
error code
and return.
Let
response result
be the return value
obtained by running the
remote end steps
for
command
with
session
URL
variables
, and
parameters
If
response result
is an
error
send an error
with
error code
equal to
response result
's
error code
and return.
Assert:
response result
is a
success
Let
response data
be
response result
's data.
Send a response
with status 200 and
response data
When required to
send an error
, with
error
code
and an optional
error data
dictionary,
remote end
must run the following steps:
Let
status
and
name
be the
error response data
for
error code
Let
message
be an implementation-defined string
containing a human-readable description of the reason for the error.
Let
stacktrace
be an implementation-defined string
containing a stack trace report of the active stack frames
at the time when the error occurred.
Let
body
be a new JSON
Object
initialized with the following properties:
error
name
message
message
stacktrace
stacktrace
If the
error data
dictionary contains any entries,
set the "
data
" field on
body
to a new JSON
Object
populated with the dictionary.
Send a response
with
status
and
body
as arguments.
When required to
send a response
with arguments
status
and
data
remote end
must run the following steps:
Let
response
be a new
response
Set
response
's
HTTP status
to
status
and
status message
to the string corresponding
to the description of
status
in the
status code registry
Set
the
response
's
header
with
name
and
value
with the following values:
Content-Type
application/json; charset=utf-8
Cache-Control
no-cache
Let
response
's
body
be
the
UTF-8 encoded
JSON
serialization
of a JSON
Object
with a key
value
" set to
data
Let
response bytes
be the byte sequence resulting
from serializing
response
according to the rules in [
RFC7230
].
Write
response bytes
to the
connection
6.4
Routing requests
Request routing
is the process of going from an
HTTP request
to the
series of steps
needed
to implement the
command
represented by that request.
remote end
has an associated
URL prefix
which is used as a prefix on all WebDriver-defined URLs on that
remote end
This must either be
undefined
or a
path-absolute URL
Example
For example a
remote end
wishing
to run alongside other services on
example.com
might set its
URL prefix
to
/wd
so that a
new session
command
would be invoked
by sending a POST request to
/wd/session
rather than
/session
In order to
match a request
given
method
and
URL
the following steps must be taken:
Let
endpoints
be a list
containing each row in the
table of endpoints
Remove each entry from
endpoints
for which the
concatenation of the
URL prefix
and the entry's
URI
template
does not have a valid expansion equal to
URL
's
path
If there are no entries in
endpoints
return
error
with
error code
unknown command
Remove each entry in
endpoints
for which the
method
column
is not equal to
method
If there are no entries in
endpoints
return
error
with
error code
unknown method
There is now exactly one entry in
endpoints
let
entry
be this entry.
Let
URI template
be the concatenation of
URL
prefix
with
entry
's
URI template
Let
command
be
entry
's
command
Let
URL variables
be a
map
with one
entry
for each variable defined in
URI template
with the entry name equal to the template variable name, and the
entry value being the variable value required to expand the
URI
template
to match
URL
's
path
Return
success
with data
command
and
URL variables
6.5
Endpoints
The following
table of endpoints
lists
the
method
and
URI template
for each
endpoint
node
command
Extension commands
are implicitly
appended to this table.
Method
URI Template
Command
POST
/session
New Session
DELETE
/session/{
session id
Delete Session
GET
/status
Status
GET
/session/{
session id
}/timeouts
Get Timeouts
POST
/session/{
session id
}/timeouts
Set Timeouts
POST
/session/{
session id
}/url
Navigate To
GET
/session/{
session id
}/url
Get Current URL
POST
/session/{
session id
}/back
Back
POST
/session/{
session id
}/forward
Forward
POST
/session/{
session id
}/refresh
Refresh
GET
/session/{
session id
}/title
Get Title
GET
/session/{
session id
}/window
Get Window Handle
DELETE
/session/{
session id
}/window
Close Window
POST
/session/{
session id
}/window
Switch To Window
GET
/session/{
session id
}/window/handles
Get Window Handles
POST
/session/{
session id
}/window/new
New Window
POST
/session/{
session id
}/frame
Switch To Frame
POST
/session/{
session id
}/frame/parent
Switch To Parent Frame
GET
/session/{
session id
}/window/rect
Get Window Rect
POST
/session/{
session id
}/window/rect
Set Window Rect
POST
/session/{
session id
}/window/maximize
Maximize Window
POST
/session/{
session id
}/window/minimize
Minimize Window
POST
/session/{
session id
}/window/fullscreen
Fullscreen Window
GET
/session/{
session id
}/element/active
Get Active Element
GET
/session/{
session id
}/element/{
element id
}/shadow
Get Element Shadow Root
POST
/session/{
session id
}/element
Find Element
POST
/session/{
session id
}/elements
Find Elements
POST
/session/{
session id
}/element/{element id}/element
Find Element From Element
POST
/session/{
session id
}/element/{element id}/elements
Find Elements From Element
POST
/session/{
session id
}/shadow/
{shadow id}
/element
Find Element From Shadow Root
POST
/session/{
session id
}/shadow/
{shadow id}
/elements
Find Elements From Shadow Root
GET
/session/{
session id
}/element/{
element id
}/selected
Is Element Selected
GET
/session/{
session id
}/element/{
element id
}/attribute/{
name
Get Element Attribute
GET
/session/{
session id
}/element/{
element id
}/property/{
name
Get Element Property
GET
/session/{
session id
}/element/{
element id
}/css/{
property name
Get Element CSS Value
GET
/session/{
session id
}/element/{
element id
}/text
Get Element Text
GET
/session/{
session id
}/element/{
element id
}/name
Get Element Tag Name
GET
/session/{
session id
}/element/{
element id
}/rect
Get Element Rect
GET
/session/{
session id
}/element/{
element id
}/enabled
Is Element Enabled
GET
/session/{
session id
}/element/{
element id
}/computedrole
Get Computed Role
GET
/session/{
session id
}/element/{
element id
}/computedlabel
Get Computed Label
POST
/session/{
session id
}/element/{
element id
}/click
Element Click
POST
/session/{
session id
}/element/{
element id
}/clear
Element Clear
POST
/session/{
session id
}/element/{
element id
}/value
Element Send Keys
GET
/session/{
session id
}/source
Get Page Source
POST
/session/{
session id
}/execute/sync
Execute Script
POST
/session/{
session id
}/execute/async
Execute Async Script
GET
/session/{
session id
}/cookie
Get All Cookies
GET
/session/{
session id
}/cookie/{
name
Get Named Cookie
POST
/session/{
session id
}/cookie
Add Cookie
DELETE
/session/{
session id
}/cookie/{
name
Delete Cookie
DELETE
/session/{
session id
}/cookie
Delete All Cookies
POST
/session/{
session id
}/actions
Perform Actions
DELETE
/session/{
session id
}/actions
Release Actions
POST
/session/{
session id
}/alert/dismiss
Dismiss Alert
POST
/session/{
session id
}/alert/accept
Accept Alert
GET
/session/{
session id
}/alert/text
Get Alert Text
POST
/session/{
session id
}/alert/text
Send Alert Text
GET
/session/{
session id
}/screenshot
Take Screenshot
GET
/session/{
session id
}/element/{
element id
}/screenshot
Take Element Screenshot
POST
/session/{
session id
}/print
Print Page
6.6
Errors
Errors
are represented in the WebDriver protocol
by an
HTTP response
with an
HTTP status
in the 4xx or 5xx range,
and a JSON body containing details of the
error
The body is a JSON
Object
and has a field named "
value
whose value is an object bearing three, and sometimes four, fields:
error
", containing a string indicating the
error code
message
", containing an implementation-defined string
with a human readable description of the kind of error that occurred.
stacktrace
", containing an implementation-defined string
with a stack trace report of the active stack frames at the time when the error occurred.
Optionally "
data
", which is a JSON
Object
with additional
error data
helpful in diagnosing the error.
Example
GET
request to
/session/1234/url
where
1234
is not the
session id
of a
session
would return an
HTTP response
with the status 404 and a body of the form:
"value"
"error"
"invalid session id"
"message"
"No active session with ID 1234"
"stacktrace"
""
Certain commands may also annotate
errors
with additional
error data
Notably, this is the case for commands
which invoke the
user prompt handler
where the
user prompt message
may be included in a "
text
" field:
"value": {
"error":
"unexpected alert open"
"message"
""
"stacktrace"
""
"data"
: {
text
":
"Message from window.alert"
The following table lists each
error code
its associated
HTTP status
JSON
error
code,
and a non-normative description of the error.
The
error response data
for a particular
error code
is the values of the
HTTP Status
and
JSON Error Code
columns for the row corresponding to that
error code
Error Code
HTTP Status
JSON Error Code
Description
element click intercepted
400
element click intercepted
The
Element Click
command
could not be completed
because the
element
receiving the events
is
obscuring
the element that was requested clicked.
element not interactable
400
element not interactable
command
could not be completed
because the element is not
pointer
- or
keyboard
interactable
insecure certificate
400
insecure certificate
caused
the user agent to hit a certificate warning,
which is usually the result of an expired or invalid TLS certificate.
invalid argument
400
invalid argument
The arguments passed to a
command
are either invalid or malformed.
invalid cookie domain
400
invalid cookie domain
An illegal attempt was made to set a cookie
under a different domain than the current page.
invalid element state
400
invalid element state
command
could not be completed because the element is
in an invalid state, e.g. attempting to
clear
an element that isn't
both
editable
and
resettable
invalid selector
400
invalid selector
Argument was an invalid selector.
invalid session id
404
invalid session id
Occurs if the given
session id
is not in the list of
active sessions
meaning the
session
either does not exist or that it's not active.
javascript error
500
javascript error
An error occurred while executing JavaScript supplied by the user.
move target out of bounds
500
move target out of bounds
The target for mouse interaction is not in the browser's viewport
and cannot be brought into that viewport.
no such alert
404
no such alert
An attempt was made to operate on a modal dialog
when one was not open.
no such cookie
404
no such cookie
No cookie matching the given path name
was found amongst the
associated cookies
of
session
's
current browsing context
's
active document
no such element
404
no such element
An element could not be located on the page
using the given search parameters.
no such frame
404
no such frame
command
to switch to a frame
could not be satisfied because the frame could not be found.
no such window
404
no such window
command
to switch to a window
could not be satisfied because the window could not be found.
no such shadow root
404
no such shadow root
The element does not have a shadow root.
script timeout error
500
script timeout
A script did not complete before its timeout expired.
session not created
500
session not created
A new
session
could not be created.
stale element reference
404
stale element reference
command
failed because
the referenced
element
is no longer attached to the DOM.
detached shadow root
404
detached shadow root
command
failed because
the referenced
shadow root
is no longer attached to the DOM.
timeout
500
timeout
An operation did not complete before its timeout expired.
unable to set cookie
500
unable to set cookie
command
to set a cookie's value could not be satisfied.
unable to capture screen
500
unable to capture screen
A screen capture was made impossible.
unexpected alert open
500
unexpected alert open
A modal dialog was open, blocking this operation.
unknown command
404
unknown command
command
could not be executed
because the
remote end
is not aware of it.
unknown error
500
unknown error
An unknown error occurred in the
remote end
while processing the
command
unknown method
405
unknown method
The requested
command
matched a known URL
but did not match any method for that URL.
unsupported operation
500
unsupported operation
Indicates that a
command
that should have
executed properly cannot be supported for some reason.
An
error data
dictionary
is a mapping of string keys to JSON serializable values
that can optionally be included with
error
objects.
6.7
Extensions
Using the terminology defined in this section, others may define additional
commands that seamlessly integrate with the standard protocol. This allows
vendors to expose functionality that is specific to their user agent, and it
also allows other web standards to define commands for automating new platform
features.
Commands defined in this way
are called
extension commands
and behave no differently than other
commands
each has a dedicated HTTP endpoint and a set of
remote end steps
Each
extension command
has an associated
extension command URI Template
that is a
URI Template
string,
and which should bear some resemblance to what the command performs.
This value,
along with the HTTP method and
extension command
is added to the
table of endpoints
and thus follows the same rules for
request routing
as that of other built-in
commands
In order to avoid potential resource conflicts with other implementations,
vendor-specific
extension command URI Templates
must begin with one
or more path segments which uniquely identifies the vendor and UA.
It is suggested that vendors use their vendor prefixes
without additional characters as outlined in [
CSS21
],
notably in
section 4.1.2.2 on
vendor keywords
as the name for this path element,
and include a vendor-chosen UA identifier.
Note
If the
extension command URI Template
includes a variable named
session id
, the value of this variable will be used to define the
session
during command processing.
Example
This might lead to a URL of the form
/session/5d376174-36f0-11e5-9b9a-6bdf200a3f7f/
ms
edge
context
where
session/{
session id
associates the request
with the specified session,
ms/edge
identifies the command as
specific to the Edge browser distributed by Microsoft,
and
context
describes the functionality
that, in the context of Edge, allows a
local end
to switch between browser-specific contexts.
Requesting this URL will call the
extension command
's
remote end steps
Other specifications may define
additional WebDriver capabilities
. Each defined
capability must have a
capability name
which
is a string not containing a "
" (colon) character,
an
additional capability deserialization
algorithm
which is a set of steps taking a single
argument
value
which has a JSON type, returning
either
success
wrapping the deserialized capability value
or
error
An
additional WebDriver capability
may also define
matched capability serialization algorithm
which is a set of steps used to determine if a capability is matched
by the current implementation and provide any computed value to return
to the user. This set of steps takes a single
argument
value
, which is the output of the
corresponding
additional capability deserialization algorithm
and returns either
null
to indicate the capability
is not matched, or a non-null JSON-serializable value if the
capability is matched.
Other specifications may also define
WebDriver
new session algorithms
, which are called just after a new
session is created, and before the
new session
response is sent
to the
remote end
. These algorithms are called
with
session
representing the WebDriver session that will
be established, and
capabilities
, the capabilities object that will be returned
to the
remote end
. It is permitted for such an algorithm to
modify any entry in the capabilities object with a name that's an
additional WebDriver capability
defined by the same
specification.
Remote ends
may also introduce
extension capabilities
that are extra
capabilities
used to provide configuration or fulfill other vendor-specific needs.
Extension capabilities' key
must contain a "
" (colon) character,
denoting an implementation specific namespace.
The value can be arbitrary JSON types.
As with
extension commands
it is suggested that the key used to denote
the
extension capability
namespace
is based on the
vendor keywords
listed in [
CSS21
and precedes the first "
" character in the string.
Example
Extension capabilities
are typically used
to provide UA or
intermediary node
specific configuration
that is not handled by the
table of standard capabilities
An example
new session
request body
might look like this:
"capabilities"
"alwaysMatch"
// browser specific configuration
":browserOptions"
"binary"
"/usr/bin/browser-binary"
"args"
"--start-page=https://example.com"
7.
Capabilities
WebDriver
capabilities
are used to communicate the features supported by a given implementation.
The
local end
may use capabilities
to define which features it requires the
remote end
to satisfy when creating a
new session
Likewise, the
remote end
uses capabilities
to describe the full feature set for a
session
The following
table of standard capabilities
enumerates the capabilities each implementation must support.
An implementation may define additional
extension capabilities
Example
As an example, Mozilla could elect to hide new features behind capabilities
with a "
moz:
" prefix:
"browserName"
"firefox"
"browserVersion"
"1234"
"moz:experimental-webdriver"
true
Capability
Key
Value Type
Description
Browser name
browserName
string
Identifies the user agent.
Browser version
browserVersion
string
Identifies the version of the user agent.
Platform name
platformName
string
Identifies the operating system of the
endpoint node
Accept insecure TLS certificates
acceptInsecureCerts
boolean
Indicates whether untrusted and self-signed TLS certificates
are implicitly trusted on
for the duration of the
session
Page load strategy
pageLoadStrategy
string
Defines the
session
's
page load strategy
Proxy configuration
proxy
JSON
Object
Defines the
session
's
proxy configuration
Window dimensioning/positioning
setWindowRect
boolean
Indicates whether the remote end supports all of the
resizing and repositioning
commands
Session timeouts
timeouts
JSON
Object
Describes the
timeouts
imposed on certain session operations.
Strict file interactability
strictFileInteractability
boolean
Defines the
session
's
strict file interactability
Unhandled prompt behavior
unhandledPromptBehavior
string
Describes the
session
's
user prompt handler
Defaults to "
dismiss and notify
".
User Agent
userAgent
string
Identifies the
default User-Agent value
of the
endpoint node
7.1
Proxy
The
proxy configuration
capability
is a JSON
Object
nested
within the primary
capabilities
Implementations may define additional proxy configuration options,
but they must not alter the semantics of those listed below.
Key
Value Type
Description
Valid values
proxyType
string
Indicates the type of proxy configuration.
pac
",
direct
",
autodetect
",
system
",
or "
manual
".
proxyAutoconfigUrl
string
Defines the URL for a
proxy autoconfiguration
file
if
proxyType
is equal to "
pac
".
Any
URL
httpProxy
string
Defines the proxy
host
for HTTP traffic when
the
proxyType
is "
manual
".
host and optional port
for
scheme "
http
".
noProxy
array
Lists the address for which the proxy should be bypassed when
the
proxyType
is "
manual
".
List
containing any number of
String
s.
sslProxy
string
Defines the proxy
host
for encrypted TLS traffic
when the
proxyType
is "
manual
".
host and optional port
for
scheme "
https
".
socksProxy
string
Defines the proxy
host
for a
SOCKS proxy
when the
proxyType
is "
manual
".
host and optional port
with an
undefined
scheme.
socksVersion
number
Defines the
SOCKS proxy
version
when the
proxyType
is "
manual
".
Any
integer
between 0 and 255 inclusive.
host and optional port
for a
scheme
is
defined as being a valid
host
, optionally followed by a colon
and a valid
port
. The
host
may
include credentials
. If the
port is omitted and
scheme
has a
default port
this is the implied port. Otherwise, the port is left undefined.
proxyType
of "
direct
" indicates
that the browser should not use a proxy at all.
proxyType
of "
system
" indicates
that the browser should use the various proxies configured for the
underlying Operating System.
proxyType
of "
autodetect
indicates that the proxy to use should be detected in an
implementation-specific way.
The
remote end
steps to
deserialize as a proxy
argument
parameter
are:
If
parameter
is not a JSON
Object
return
an
error
with
error code
invalid argument
Let
proxy
be a new, empty
proxy configuration object
For each enumerable
own property
in
parameter
run the following substeps:
Let
key
be the name of the property.
Let
value
be the result of
getting a
property
named
name
from
parameter
If there is no matching
key
for
key
in the
proxy configuration
table return an
error
with
error code
invalid argument
If
value
is not one of the
valid values
for that
key
, return
an
error
with
error code
invalid argument
Set a property
key
to
value
on
proxy
If
proxy
does not have an
own property
for
proxyType
" return an
error
with
error
code
invalid argument
If the result of
getting a property
named "
proxyType
" from
proxy
equals
pac
", and
proxy
does not have an
own property
for "
proxyAutoconfigUrl
" return
an
error
with
error code
invalid argument
If
proxy
has an
own property
for
socksProxy
" and does not have an
own property
for "
socksVersion
" return an
error
with
error
code
invalid argument
Return
success
with data
proxy
proxy configuration object
is a
JSON
Object
where each of its
own properties
matching
keys in the
proxy configuration
meets the validity criteria for
that key.
7.2
Processing capabilities
To
process
capabilities
given
parameters
, and
session
configuration flags
flags
, the
endpoint node
must take the following steps:
Let
capabilities request
be the result of
getting the property
capabilities
" from
parameters
If
capabilities request
is not a JSON
Object
return
error
with
error code
invalid argument
Let
required capabilities
be the result of
getting the property
alwaysMatch
" from
capabilities request
If
required capabilities
is
undefined
set the value to an empty JSON
Object
Let
required capabilities
be the result
of
trying
to
validate capabilities
with
arguments
required capabilities
and
flag
Let
all first match capabilities
be the result of
getting the property
firstMatch
" from
capabilities request
If
all first match capabilities
is
undefined
set the value to a
List
with a single entry of an empty JSON
Object
If
all first match capabilities
is not a
List
with one or more entries, return
error
with
error code
invalid argument
Let
validated first match capabilities
be an empty
List
For each
first match capabilities
corresponding
to an indexed property in
all first match capabilities
Let
validated capabilities
be the result
of
trying
to
validate capabilities
with arguments
first match capabilities
and
flags
Append
validated capabilities
to
validated first match capabilities
Let
merged capabilities
be an empty
List
For each
first match capabilities
corresponding
to an indexed property in
validated first match capabilities
Let
merged
be the result of
trying
to
merge
capabilities
with
required capabilities
and
first match capabilities
as arguments.
Append
merged
to
merged capabilities
For each
capabilities
corresponding
to an indexed property in
merged capabilities
Let
matched capabilities
be the result of
trying
to
match
capabilities
with
capabilities
as an
argument.
If
matched capabilities
is not
null
return
success
with data
matched capabilities
Return
success
with data
null
When required to
validate capabilities
with
argument
capabilities
If
capabilities
is not a JSON
Object
return
an
error
with
error code
invalid argument
Let
result
be an empty JSON
Object
For each enumerable
own property
in
capabilities
, run the following substeps:
Let
name
be the name of the property.
Let
value
be the result of
getting a
property
named
name
from
capabilities
Run the substeps of the first matching condition:
value
is
null
Let
deserialized
be set to
null
name
equals "
acceptInsecureCerts
If
value
is not a
boolean
return
an
error
with
error code
invalid
argument
. Otherwise, let
deserialized
be
set to
value
name
equals "
browserName
name
equals "
browserVersion
name
equals "
platformName
If
value
is not a
string
return
an
error
with
error code
invalid
argument
. Otherwise, let
deserialized
be
set to
value
name
equals "
pageLoadStrategy
Let
deserialized
be the result of
trying
to
deserialize as a page load strategy
with argument
value
name
equals "
proxy
Let
deserialized
be the result of
trying
to
deserialize as a proxy
with argument
value
name
equals "
strictFileInteractability
If
value
is not a
boolean
return
an
error
with
error code
invalid
argument
. Otherwise, let
deserialized
be
set to
value
name
equals "
timeouts
Let
deserialized
be the result of
trying
to
deserialize as timeouts configuration
with
value
name
equals "
unhandledPromptBehavior
Let
deserialized
be the result of
trying
to
deserialize as an unhandled prompt behavior
with argument
value
name
is the name of an
additional WebDriver
capability
Let
deserialized
be the result of
trying
to run the
additional capability deserialization
algorithm
for the extension capability corresponding
to
name
, with argument
value
name
is the key of an
extension capability
If
name
is known to the implementation,
let
deserialized
be the result of
trying
to deserialize
value
in an implementation-specific way.
Otherwise, let
deserialized
be set to
value
The
remote end
is an
endpoint node
Return an
error
with
error code
invalid argument
If
deserialized
is not
null
set a property
on
result
with name
name
and value
deserialized
Return
success
with data
result
When
merging capabilities
with JSON
Object
arguments
primary
and
secondary
an
endpoint node
must take the following steps:
Let
result
be a new JSON
Object
For each enumerable
own property
in
primary
run the following substeps:
Let
name
be the name of the property.
Let
value
be the result of
getting a
property
named
name
from
primary
Set a property
on
result
with
name
name
and value
value
If
secondary
is
undefined
return
result
For each enumerable
own property
in
secondary
run the following substeps:
Let
name
be the name of the property.
Let
value
be the result of
getting a
property
named
name
from
secondary
Let
primary value
be the result of
getting the property
name
from
primary
If
primary value
is not
undefined
return an
error
with
error code
invalid argument
Set a property
on
result
with name
name
and value
value
Return
result
Note
The algorithm outlined in
matching capabilities
blithely ignores real-world problems
that make implementation less than perfectly straightforward,
particularly since capabilities can interact in unforeseen ways.
As an example, an implementation could have a capability
that gives the path to the browser binary to use.
This could cause both
browserName
and
browserVersion
to be impossible to match against until the browser process is started.
When
matching capabilities
given
JSON
Object
capabilities
, and a
session
configuration flags
flags
, an
endpoint node
must take the following steps:
Let
matched capabilities
be a JSON
Object
with the following entries:
browserName
ASCII Lowercase
name of the user agent as a
string
browserVersion
The user agent version, as a
string
platformName
ASCII Lowercase
name of the current platform as a
string
acceptInsecureCerts
Boolean
initially set to false,
indicating the session will not implicitly trust untrusted
or self-signed TLS certificates on
strictFileInteractability
Boolean
initially set to false,
indicating that interactability checks will be applied to
.
setWindowRect
Boolean indicating whether the
remote end
supports all
of the
resizing and
positioning
commands
userAgent
String containing the
default User-Agent value
If
flags
contains "
http
", add the
following entries to
matched capabilities
strictFileInteractability
Boolean
initially set to false,
indicating that interactabilty checks will be applied to
.
Optionally add
extension capabilities
as entries
to
matched capabilities
. The values of these may be
elided, and there is no requirement that all
extension capabilities
be added.
Note
This allows a
remote end
to add information that might be
useful to a
local end
without unnecessarily bloating the
response sent back to the user with (e.g.) an entire browser
profile.
For example, an implementation could choose to indicate that a
screenshot will be taken when returning an error by setting the
capability
se:screenshot-on-error
to
true
For each
name
and
value
corresponding
to
capabilities
's
own properties
Let
match value
equal
value
Run the substeps of the first matching
name
browserName
If
value
is not a string equal to
the "
browserName
" entry in
matched capabilities
, return
success
with data
null
Note
There is a chance the
remote end
will need
to start a browser process to correctly determine
the
browserName
. Lightweight checks are preferred
before this is done.
browserVersion
Compare
value
to the "
browserVersion
" entry in
matched capabilities
using an implementation-defined comparison algorithm.
The comparison is to accept a
value
that places constraints on the version using
the "
", "
<=
", "
",
and "
>=
" operators.
If the two values do not match,
return
success
with data
null
Note
Version comparison is left as an implementation detail
since each user agent will likely have conflicting methods
of encoding the user agent version,
and standardizing these schemes is beyond the scope of this standard.
Note
There is a chance the
remote end
will need
to start a browser process to correctly determine
the
browserVersion
. Lightweight checks are preferred
before this is done.
platformName
If
value
is not a string equal
to the "
platformName
" entry in
matched capabilities
return
success
with data
null
Note
The following platform names are in common usage with
well-understood semantics and, when
matching
capabilities
for
platform name
, greatest interoperability can be achieved by
honoring them as valid synonyms for well-known Operating
Systems:
Key
System
linux
Any server or desktop system based upon the Linux kernel.
mac
Any version of Apple's macOS.
windows
Any version of Microsoft Windows, including desktop and mobile versions.
This list is not exhaustive.
When returning
capabilities
from
New Session
it is valid to return a more
specific
platformName
, allowing users to
correctly identify the Operating System the WebDriver
implementation is running on.
acceptInsecureCerts
If
accept insecure TLS
flag is set and not equal
to
value
, return
success
with
data
null
Note
If the
endpoint node
does not
support
insecure TLS certificates
and this is the reason
no match is ultimately made, it is useful to provide this
information to the
local end
proxy
If the
has proxy configuration
flag is set, or if
the proxy configuration defined in
value
is not one
that passes the
endpoint node
's implementation-specific
validity checks, return
success
with
data
null
Note
local end
would only send this capability
if it expected it to be honored and the configured proxy
used. The intent is that if this is not possible a new session
will not be established.
unhandledPromptBehavior
If
check user prompt handler matches
with
value
is false, return
success
with
data
null
Otherwise
If
name
is the name of an
additional
WebDriver capability
which defines a
matched
capability serialization algorithm
, let
match
value
be the result of running the
matched
capability serialization algorithm
for
capability
name
with
arguments
value
, and
flags
Otherwise, if
name
is the key of an
extension capability
, let
match value
be the
result of
trying
implementation-specific steps to
match on
name
with
value
. If the
match is not successful, return
success
with
data
null
If
match value
is not null,
set a property
on
matched capabilities
with name
name
and
value
match value
Return
success
with data
matched capabilities
8.
Sessions
A WebDriver
session
represents the
logical connection between a
local end
and a
specific
remote end
. The
session
object holds state
specific to that connection.
An
intermediary node
will maintain an
associated
session
for each active
session
. This is
the
session
on the
upstream
neighbor that is created
when the
intermediary node
executes the
New
Session
command
. Closing a
session
on
an
intermediary node
will also
close the session
of
the
associated session
session
has a
session ID
, which is the string
representation of a
UUID
used to uniquely identify the
session. This is set when creating the session.
session
has a boolean
HTTP flag
which is set
when the session is created. A session with this flag set is
an
HTTP session
remote end
has an associated list of
active sessions
, which is a list of all
session
that are currently started.
remote end
has an associated list of
active HTTP sessions
, which is a list of all
HTTP
session
s that are currently started.
Note
The limitation of a single HTTP session for
endpoint
node
s means that the first entry in the list of
active HTTP
sessions
will be the only entry.
HTTP session
has an associated
current browsing
context
, which is the
browsing context
against
which
commands
will run, an associated
current parent
browsing context
, which is set to the parent of the
current
browsing context
when changing browsing contexts, and an
associated
current top-level browsing context
, which is
set to the top-browsing context ancestor of the
current browsing
context
, when changing browsing contexts.
An
HTTP session
has an associated
session
timeouts
which is a
timeouts configuration
. This is
initially set to a new
timeouts configuration
An
HTTP session
has an associated
page loading
strategy
, which is one of the keywords from the
table of page
load strategies
. This is initially set to
normal
An
HTTP session
has an associated
strict file
interactability
state which is a boolean. This is initially set
to false.
session
has an associated
browsing context input
state map
, which is a
weak map
with
top-level browsing
contexts
as keys, and
input state
objects as values. This
is initially set to an empty map.
An
HTTP session
has an associated
request queue
which is a
queue
of
requests
that are currently awaiting
processing.
When a session is created, a
set
of
session configuration
flags
are provided that define the features of the session. This
specification always creates sessions with "
http
in
session configuration flags
, which corresponds to
the
HTTP flag
. External specifications may define additional
flags, or create sessions without the
HTTP flag
8.1
Global State
In addition to per-session state, a
remote end
that is
an
endpoint node
also has additional state that is global
across all sessions.
An
endpoint node
has an associated
accept insecure
TLS
flag that indicates whether untrusted or self-signed TLS
certificates are treated as trusted. The default value of the flag is
false if the endpoint doesn't support accepting insecure TLS
connections, or unset otherwise.
An
endpoint node
has an associated
has proxy
configuration
flag that indicates whether the proxy is already
configured. The default value of the flag is true if the endpoint
doesn't support proxy configuration, or false otherwise.
To
create a session
, given a JSON
Object
capabilites
, and
session configuration
flags
flags
Let
session id
be the result of
generating a UUID
Let
session
be a new
session
with
session
ID
session id
, and
HTTP flag
flags
contains
http
".
Let
proxy
be the result of getting property
proxy
" from
capabilities
and run the
substeps of the first matching statement:
proxy
is a
proxy configuration
object
Take implementation-defined steps to set the user agent proxy
using the extracted
proxy
configuration. If the
defined proxy cannot be configured return
error
with
error code
session not created
. Otherwise
set the
has proxy configuration
flag to true.
Otherwise
Set a property
of
capabilities
with name
proxy
" and a value that is a new JSON
Object
If
capabilites
has a property named
acceptInsecureCerts
", set the
endpoint
node
's
accept insecure TLS
flag to the result
of
getting a property
named
acceptInsecureCerts
" from
capabilities
Let
user prompt handler capability
be the result of getting property
unhandledPromptBehavior
" from
capabilities
If
user prompt handler capability
is not
undefined,
update the user prompt handler
with
user
prompt handler capability
Let
serialized user prompt handler
be
serialize the user prompt handler
Set a property on
capabilities
with the name
unhandledPromptBehavior
", and the value
serialized
user prompt handler
If
flags
contains
http
":
Let
strategy
be the result of getting
property "
pageLoadStrategy
from
capabilities
If
strategy
is a
string, set the
session
's
page loading
strategy
to
strategy
. Otherwise, set
the
page loading strategy
to
normal
and
set a
property
of
capabilities
with name
pageLoadStrategy
" and value
normal
".
Let
strictFileInteractability
be the result
of getting property "
strictFileInteractability
from
capabilities
. If
strictFileInteractability
is a boolean, set
session
's
strict file
interactability
to
strictFileInteractability
Let
timeouts
be the result of getting a
property "
timeouts
" from
capabilities
If
timeouts
is not undefined, set
session
's
session timeouts
to
timeouts
Set a property on
capabilities
with name
timeouts
" and value
serialize the timeouts
configuration
with
session
's
session timeouts
Process any
extension capabilities
in
capabilities
in an implementation-defined manner.
Run any
WebDriver new session algorithm
defined in
external specifications, with arguments
session
capabilities
, and
flags
Append
session
to
active sessions
If
flags
contains "
http
",
append
session
to
active HTTP sessions
Set the
webdriver-active flag
to true.
To
close the session
, given
session
remote end
must take the following steps:
If
session
's
HTTP flag
is set,
remove
session
from
active HTTP sessions
Remove
session
from
active sessions
Perform the following substeps based on the
remote end
's
type:
Remote end
is an
endpoint node
If the list of
active sessions
is empty:
Set the
webdriver-active flag
to false
Set the
user prompt handler
to null.
Unset the
accept insecure TLS
flag.
Reset the
has proxy configuration
flag to its
default value.
Optionally,
close
all
top-level browsing contexts
without
prompting to unload
Remote end
is an
intermediary node
Close
the
associated session
. If this causes
an
error
to occur, complete the remainder of this
algorithm before returning the
error
Perform any implementation-specific cleanup steps.
If an
error
has occurred in any of the steps above,
return the
error
, otherwise return
success
with
data
null
Closing a
session
might cause the associated browser process to be killed.
It is assumed that any implementation-specific cleanup steps
are performed
after
the response has been sent back to the client
so that the
connection
is not prematurely closed.
8.2
New Session
HTTP Method
URI Template
POST
/session
The
New Session
command
creates a new WebDriver
session
with the
endpoint node
If the creation fails, a
session not created
error
is returned.
If the
remote end
is an
intermediary node
it may use the result of the
capabilities processing
algorithm
to route the
new session
request to the appropriate
endpoint node
An
intermediary node
is free to define
extension capabilities
to assist in this process, however, these specific capabilities
must not be forwarded to the
endpoint node
If the
intermediary node
requires additional information unrelated to user agent features,
it is recommended that this information be passed as top-level parameters,
and not as part of the requested
capabilities
An
intermediary node
must forward custom,
top-level parameters (i.e. non-
capabilities
) to subsequent
remote end
nodes.
Example
An
intermediary node
might require authentication
on
creating a new session
This authentication is an argument to the
New Session
command
itself and not the user agent's
capabilities
Therefore, the authentication should be passed
as a top-level parameter and not embedded in
capabilities
"user":
"alice"
"password"
"hunter2"
"capabilities"
: {…}
However, because an
intermediary node
cannot forward
extension capabilities
specific to that implementation to an
endpoint node
the following is also permitted by this specification:
"capabilities"
"alwaysMatch"
"cloud:user"
"alice"
"cloud:password"
"hunter2"
"platformName"
"linux"
"firstMatch"
"browserName"
"chrome"
"browserName"
"edge"
Once all
capabilities are merged
from this example,
an
endpoint node
would receive
New Session
capabilities identical to:
"browserName"
"chrome"
"platformName"
"linux"
"browserName"
"edge"
"platformName"
"linux"
The
remote end steps
, given
session
URL
variables
and
parameters
are:
If the implementation is an
endpoint node
, and the
list of
active HTTP sessions
is not empty, or otherwise if
the implementation is unable to start an additional session,
return
error
with
error code
session not
created
If the
remote end
is an
intermediary node
, take
implementation-defined steps that either result in returning
an
error
with
error code
session not created
or in returning a
success
with data that is isomorphic to
that returned by
remote ends
according to the rest of this
algorithm. If an
error
is not returned, the
intermediary
node
must retain a reference to the
session
created on
the
upstream
node as the
associated session
such that
commands may be forwarded to this
associated session
on
subsequent commands.
Note
How this is done is entirely up to the implementation,
but typically the
sessionId
, and
URL
and
URL prefix
of the
upstream
remote end
will need
to be tracked.
Let
flags
be a set containing "
http
".
Let
capabilities
be the result of
trying
to
process capabilities
with
parameters
and
flags
If
capabilities
's is
null
return
error
with
error code
session not created
Let
session
be the result of
create a
session
, with
capabilities
, and
flags
Let
body
be a JSON
Object
initialized with:
sessionId
session
's
session ID
capabilities
capabilities
Set
session
current top-level browsing
context
to one of the
endpoint node
's
top-level
browsing context
s, preferring the
top-level browsing
context
that
has
system
focus
, or otherwise preferring any
top-level browsing
context
whose
visibility
state
is
visible
Note
WebDriver implementations typically start a
completely new browser instance, but there is no requirement in
this specification (or for WebDriver only to be used to automate
only web browsers). Implementations might choose to use an existing
browser instance, eg. by selecting the window that currently has
focus.
Set the
request queue
to a new
queue
Return
success
with data
body
8.3
Delete Session
HTTP Method
URI Template
DELETE
/session/{
session id
The
remote end steps
, given
session
URL
variables
and
parameters
are:
If
session
is an
active HTTP session
try
to
close the session
with
session
Return
success
with data
null
8.4
Status
HTTP Method
URI Template
GET
/status
Note
Status
returns information
about whether a
remote end
is in a state in which it can create
new sessions
but may additionally include arbitrary meta information
that is specific to the implementation.
The
remote end
's
readiness state
is represented
by the
ready
property of the body,
which is false if an attempt to
create a session
at the current time would fail.
However, the value true does not guarantee
that a
New Session
command will succeed.
Implementations may optionally include
additional meta information as part of the body,
but the top-level properties
ready
and
message
are reserved and must not be overwritten.
The
remote end steps
, given
session
URL
variables
and
parameters
are:
Let
body
be a new JSON
Object
with the following properties:
ready
The
remote end
's
readiness state
message
An implementation-defined string
explaining the
remote end
's
readiness state
Return
success
with data
body
9.
Timeouts
timer
is a
struct
. It has
timeout fired flag
, which is a
boolean, initially false.
To
start the timer
given
timer
and
timeout
Assert:
timeout
is not null.
Run the following steps
in parallel
Wait for at least
timeout
milliseconds to pass.
Set
timer
's
timeout fired flag
to true.
timeouts configuration
is a
struct
representing
the timeouts for
script evaluation
, and
element
retrieval
. It has a
script
timeout
item
which is an integer or null and is
initially set to 30,000, a
page load
timeout
item
which is an integer or null and is
initially set to 300,000, and an
implicit
wait timeout
item
which is an integer or null and is
initially set to 0.

To
deserialize as timeouts configuration
given
timeouts
Set
timeouts
to the result of
converting a
JSON-derived JavaScript value to an Infra value
with
timeouts
Let
configuration
be a new
timeouts
configuration
For each
key
value
in
timeouts
If «"
script
", "
pageLoad
",
implicit
"» does not
contain
key
then continue.
If
value
is neither null nor a number greater than
or equal to 0 and less than or equal to the
maximum safe
integer
return
error
with
error code
invalid
argument
Run the substeps matching
key
script
Set
configuration
's
script timeout
to
value
pageLoad
Set
configuration
's
page load timeout
to
value
implicit
Set
configuration
's
implicit wait timeout
to
value
Return
success
with data
configuration
To
serialize the timeouts configuration
given
timeouts
Let
serialized
be an empty
map
Set
serialized
["
script
"]
to
timeouts
script timeout
Set
serialized
["
pageLoad
"]
to
timeouts
page load timeout
Set
serialized
["
implicit
"]
to
timeouts
implicit wait timeout
Return
convert an Infra value to a JSON-compatible
JavaScript value
with
serialized
MDN
Commands/GetTimeouts
Chrome
65+
Chrome Android
No
Edge
Edge Mobile
Firefox
55+
Firefox Android
No
Opera
No
Opera Android
Safari
No
Safari iOS
Samsung Internet
No
WebView Android
9.1
Get Timeouts
HTTP Method
URI Template
GET
/session/{
session id
}/timeouts
The
remote end steps
, given
session
URL
variables
and
parameters
are:
Let
timeouts
be
serialize the timeouts
configuration
with
session
's
timeouts
configuration
Return
success
with data
timeouts
MDN
Commands/SetTimeouts
Chrome
65+
Chrome Android
No
Edge
Edge Mobile
Firefox
55+
Firefox Android
No
Opera
No
Opera Android
Safari
No
Safari iOS
Samsung Internet
No
WebView Android
9.2
Set Timeouts
HTTP Method
URI Template
POST
/session/{
session id
}/timeouts
The
remote end steps
, given
session
URL
variables
and
parameters
are:
Let
timeouts
be the result of
trying
to
deserialize as timeouts configuration
with
parameters
Set
session
's
timeouts configuration
to
timeouts
Return
success
with data
null
10.
The
commands
in this section allow navigation of
the
session
's
current top-level browsing context
to new
URLs and introspection of the document currently loaded in
this
browsing context
For
commands
that cause a new document to load,
the point at which the command returns
is determined by the session's
page loading strategy
The
normal
state causes it
to return after the
load
event fires
on the new page,
eager
causes it to return
after the
DOMContentLoaded
event fires
and
none
causes it to return immediately.
Navigation actions are also affected by the value of
the
page load timeout
which determines the maximum time that commands will block
before returning with a
timeout
error
The following is the
table of page load strategies
that links the
pageLoadStrategy
capability
keyword
to a
page loading strategy
state,
and shows which
document readiness
state
that corresponds to it:
Keyword
Page load strategy state
Document readiness state
none
none
eager
eager
interactive
normal
normal
complete
When asked to
deserialize as a page load strategy
with
argument
value
If
value
is not a
string
return
an
error
with
error code
invalid argument
If there is no entry in the
table of page load
strategies
with
keyword
value
return
an
error
with
error code
invalid argument
Return
success
with data
value
To
wait for navigation to complete
given
session
and optional
timer
(default
null):
If
session
's
page loading strategy
is
none
, return
success
with
data
null
If
session
's
current browsing context
is
no longer open
, return
success
with
data
null
Let
timeout
be
session
timeouts
page load timeout
If
timer
is null:
Set
timer
to a new
timer
If
timeout
is not null:
Start the timer
with
timer
and
timeout
Run these steps, but
abort when
timer
's
timeout fired flag
is set:
If there is an ongoing attempt to
navigate
session
's
current browsing context
that has not
yet
matured
, wait for navigation to
mature
Let
readiness target
be the
document
readiness
state associated with the
session
's
page
loading strategy
, which can be found in the
table of page
load strategies
Wait for
session
's
current browsing
context
's
document readiness
state to reach
readiness target
If aborted
return an
error
with
error
code
timeout
Return
success
with data
null
When asked to run the
post-navigation checks
run the substeps of the first matching statement:
response
is a network error
Return
error
with
error code
unknown error
Note
A "network error" in this case is not an
HTTP response with a status code indicating an unsuccessful result,
but could be a problem occurring lower in the OSI model, or a
failed DNS lookup.
response
is
blocked by content security policy
If the
remote end
's
accept insecure TLS
state is
true, take implementation specific steps to ensure the navigation
is not aborted and that the untrusted or invalid TLS certificate
error that would normally occur under these circumstances, are
suppressed.
Otherwise return
error
with
error code
insecure certificate
response
's
HTTP status
is 401
Otherwise
Irrespective of how a possible authentication challenge is handled,
return
success
with data
null
10.1
Navigate To
HTTP Method
URI Template
POST
/session/{
session id
}/url
Note
The command causes the user agent to
navigate
the
session
's
current top-level browsing context
to a
new location.
If the
remote end
's
accept insecure TLS
flag is true, no
certificate errors that would normally cause the user agent to abort
and show a security warning are to hinder navigation to the requested
address.
Example
To navigate the
current top-level browsing context
of
the
session
with ID
to
, the
local end
would POST
to
/session/1/url
with the body:
"url"
"https://example.com"
The
remote end steps
, given
session
URL
variables
and
parameters
are:
Let
URL
be the result of
getting a property
named "
url
" from
parameters
If
session
's
current top-level browsing
context
is
no longer open
, return
error
with
error code
no such window
If
URL
is not an
absolute URL
or is not an
absolute URL with fragment
or not a
local scheme
, return
error
with
error
code
invalid argument
Try
to
handle any user prompts
with
session
Let
timeout
be
session
's
session
timeouts
page load timeout
Let
current URL
be
session
's
current
top-level browsing context
's
active document
's
URL
If
current URL
and
URL
do not have the
same
absolute URL
and
timeout
is not null:
Set
timer
to a new
timer
Start the timer
with
timer
and
timeout
Run these steps, but
abort when
timer
's
timeout fired flag
is set:
Navigate
session
's
current top-level
browsing context
to
URL
If
URL
is special
except for
file
and
current URL
and
URL
do not have the same
absolute URL
Try
to
wait for navigation to complete
with
session
and
timer
Try
to run the
post-navigation checks
Set the current browsing context
with
session
and
current top-level browsing
context
While
session
's
current top-level browsing context
contains
refresh state pragma directive
of
time
1 second
or less, run the following steps:
Set
current URL
to
session
's
current top-level browsing context
's
active
document
's
URL
Wait until the refresh timeout has elapsed and
new
navigate
of
session
's
current top-level
browsing context
has begun.
Set
URL
to the destination URL of
session
's
current top-level browsing context
's
active
document
's ongoing navigation.
If
URL
is special
except for
file
and
current URL
and
URL
do not have the same
absolute URL
Try
to
wait for navigation to complete
with
session
and
timer
Try
to run the
post-navigation checks
If aborted
return an
error
with
error
code
timeout
Return
success
with data
null
10.2
Get Current URL
HTTP Method
URI Template
GET
/session/{
session id
}/url
The
remote end steps
, given
session
URL
variables
and
parameters
are:
If
session
's
current top-level browsing
context
is
no longer open
, return
error
with
error code
no such window
Try
to
handle any user prompts
with
session
Let
URL
be the
serialization
of
session
's
current top-level browsing context
's
active document
's
URL
Return
success
with data
URL
10.3
Back
HTTP Method
URI Template
POST
/session/{
session id
}/back
Note
This command causes the browser to traverse
one step backward in the
joint session history
of
session
's
current top-level browsing context
This is equivalent to pressing the back button in the
browser chrome
or invoking
window.history.back
The
remote end steps
, given
session
URL
variables
and
parameters
are:
If
session
's
current top-level browsing
context
is
no longer open
, return
error
with
error code
no such window
Try
to
handle any user prompts
with
session
Let
timeout
be
session
session
timeouts
page load timeout
Let
timer
be a new
timer
If
timeout
is not null:
Start the timer
with
timer
and
timeout
Traverse the history by a delta
–1
for
session
's
current browsing context
If the previous step completed results in
pageHide
event firing
, wait
until
pageShow
event fires
or
timer
timeout fired flag
to be set,
whichever occurs first.
If
timer
timeout fired flag
is set:
Handle any user prompts
Return
error
with
error code
timeout
Return
success
with data
null
10.4
Forward
HTTP Method
URI Template
POST
/session/{
session id
}/forward
Note
This command causes the browser
to traverse one step forwards in the
joint session history
of
session
's
current top-level browsing context
This is equivalent to pressing the forward button in the
browser chrome
or invoking
window.history.forward
The
remote end steps
, given
session
URL
variables
and
parameters
are:
If
session
's
current top-level browsing
context
is
no longer open
, return
error
with
error code
no such window
Try
to
handle any user prompts
with
session
Let
timeout
be
session
session
timeouts
page load timeout
Let
timer
be a new
timer
If
timeout
is not null:
Start the timer
with
timer
and
timeout
Traverse the history by a delta
for
session
's
current browsing context
If the previous step completed results in a
pageHide
event
firing, wait until
pageShow
event fires
or
timer
timeout fired flag
to be set, whichever occurs first.
If
timer
timeout fired flag
is set:
Handle any user prompts
Return
error
with
error code
timeout
Return
success
with data
null
10.5
Refresh
HTTP Method
URI Template
POST
/session/{
session id
}/refresh
Note
This command causes the browser to reload the page
in
session
's
current top-level browsing context
The
remote end steps
, given
session
URL
variables
and
parameters
are:
If
session
's
current top-level browsing
context
is
no longer open
, return
error
with
error code
no such window
Try
to
handle any user prompts
with
session
Initiate
an overridden reload
of
session
's
current top-level browsing
context
's
active document
If
URL
is special
except for
file
Try
to
wait for navigation to complete
with
session
Try
to run the
post-navigation checks
Set the current browsing context
with
session
and
session
's
current
top-level browsing context
Return
success
with data
null
10.6
Get Title
HTTP Method
URI Template
GET
/session/{
session id
}/title
Note
This command returns the document title
of
session
's
current top-level browsing context
equivalent to calling
document.title
The
remote end steps
, given
session
URL
variables
and
parameters
are:
If
session
's
current top-level browsing
context
is
no longer open
, return
error
with
error code
no such window
Try
to
handle any user prompts
with
session
Let
title
be
the
session
's
current top-level browsing
context
's
active document
's
title
Return
success
with data
title
11.
Contexts
Many WebDriver
commands
happen in the context of either
session
's
current browsing context
or
current
top-level browsing context
session
's
current top-level browsing context
is
represented in the protocol by its associated
window handle
When a
top-level browsing context
is selected using
the
Switch To Window
command, a specific
browsing
context
can be selected using the
Switch to Frame
command.
Note
The use of the term “window” to
refer to a
top-level browsing context
is legacy and doesn't correspond with either
the operating system notion of a “window”
or the DOM
Window
object.
browsing context
is said to be
no longer open
if its
navigable
has been destroyed.
Each
browsing context
has an associated
window handle
which uniquely
identifies it. This must be a
String
and must not be
current
".
web frame
is an abstraction used to identify a
frame
or
iframe
when it is transported via the
protocol
between
remote
and
local
ends.
The
web frame identifier
is the string constant "
frame-075b-4da1-b6ba-e579c2d3230a
".
An ECMAScript
Object
represents a web frame
if it has a
web frame identifier
own property
web window
is an abstraction used to identify a
window
when it is transported via the
protocol
between
remote
and
local
ends.
The
web window identifier
is the string constant "
window-fcc6-11e5-b4f8-330a88ab9d7f
".
An ECMAScript
Object
represents a web window
if it has a
web window identifier
own property
The
WindowProxy
reference object
with
WindowProxy
object
window
is
given by:
Let
identifier
be the
web window identifier
if the associated
browsing context
of
window
is a
top-level browsing context
Otherwise let it be the
web frame identifier
Return a JSON
Object
initialized with the following properties:
identifier
Associated
window handle
of the
window
's
browsing context
To
deserialize a web frame
by a
JSON
Object
object
that
represents a web frame
If
object
has no
own property
web frame identifier
return
error
with
error code
invalid argument
Let
reference
be the result of
getting
the
web frame identifier
property
from
object
If
reference
is not a
String
return an
error
with
error code
invalid argument
Let
browsing context
be the
browsing context
whose
window handle
is
reference
, or null if no such
browsing context
exists.
If
browsing context
is null or a
top-level browsing context
return
error
with
error code
no such frame
Return
success
with data
browsing context
's associated window.
To
deserialize a web window
by a
JSON
Object
object
that
represents a web
window
If
object
has no
own property
web window identifier
return
error
with
error code
invalid argument
Let
reference
be the result of
getting
the
web window identifier
property
from
object
If
reference
is not a
String
return an
error
with
error code
invalid argument
Let
browsing context
be the
browsing context
whose
window handle
is
reference
, or null if no such
browsing context
exists.
If
browsing context
is null or not a
top-level browsing context
return
error
with
error code
no such window
Return
success
with data
browsing context
's associated window.
When required to
set the current browsing context
given
session
and
context
, an implementation must
follow the following steps:
Set
session
's
current browsing
context
to
context
Set the
session
's
current parent browsing
context
to the
parent browsing context
of
context
, if that context exists, or
null
otherwise.
When required to
set the current top-level browsing
context
given
session
and
context
, an
implementation must:
Assert:
context
is a
top-level browsing context
Set
session
's
current top-level browsing
context
to
context
Set the current browsing context
with
session
and
context
Note
In accordance with
the
focus
section of the [
HTML
] specification,
commands are unaffected by whether the operating system window has focus or not.
11.1
Get Window Handle
HTTP Method
URI Template
GET
/session/{
session id
}/window
The
remote end steps
, given
session
URL
variables
and
parameters
are:
If
session
's
current top-level browsing
context
is
no longer open
, return
error
with
error code
no such window
Return
success
with data being the
window handle
associated with
session
's
current top-level browsing context
MDN
Commands/CloseWindow
Chrome
65+
Chrome Android
No
Edge
Edge Mobile
Firefox
55+
Firefox Android
No
Opera
No
Opera Android
Safari
No
Safari iOS
Samsung Internet
No
WebView Android
11.2
Close Window
HTTP Method
URI Template
DELETE
/session/{
session id
}/window
The
remote end steps
, given
session
URL
variables
and
parameters
are:
If
session
's
current top-level browsing
context
is
no longer open
, return
error
with
error code
no such window
Try
to
handle any user prompts
with
session
Close
session
's
current top-level
browsing context
If there are no more open
top-level browsing contexts
then
try
to
close the session
Return the result of running the
remote end steps
for
the
Get Window Handles
command
with
session
URL variables
and
parameters
11.3
Switch To Window
HTTP Method
URI Template
POST
/session/{
session id
}/window
Note
Switching window will select
session
's
current top-level
browsing context
used as the target for all
subsequent
commands
. In a tabbed browser, this will typically
make the tab containing the
browsing context
the selected tab.
The
remote end steps
, given
session
URL
variables
and
parameters
are:
Let
handle
be the result of
getting the property
handle
from
parameters
If
handle
is
undefined
return
error
with
error code
invalid argument
If there is an active
user prompt
, that prevents the
focusing of another
top-level browsing context
return
error
with
error code
unexpected alert open
If
handle
is equal to the associated
window
handle
for some
top-level browsing context
let
context
be the that browsing context, and
set the current top-level browsing context
with
session
and
context
Otherwise, return
error
with
error code
no such window
Update any implementation-specific state that would result
from the user selecting
session
's
current browsing context
for
interaction, without altering OS-level focus.
Return
success
with data
null
MDN
Commands/GetWindowHandles
Chrome
65+
Chrome Android
No
Edge
Edge Mobile
Firefox
55+
Firefox Android
No
Opera
No
Opera Android
Safari
No
Safari iOS
Samsung Internet
No
WebView Android
11.4
Get Window Handles
HTTP Method
URI Template
GET
/session/{
session id
}/window/handles
The order in which the window handles are returned is arbitrary.
The
remote end steps
, given
session
URL
variables
and
parameters
are:
Let
handles
be a
List
For each
top-level browsing context
in the
remote end
push the associated
window handle
onto
handles
Return
success
with data
handles
Example
In order to determine whether or not a particular interaction
with the browser opens a new window,
one can obtain the set of window handles before the interaction is performed
and compare it with the set after the action is performed.
MDN
Commands/New_Window
This feature has limited support.
Chrome
No
Chrome Android
Edge
Edge Mobile
Firefox
66+
Firefox Android
Opera
Opera Android
Safari
No
Safari iOS
Samsung Internet
No
WebView Android
11.5
New Window
HTTP Method
URI Template
POST
/session/{
session id
}/window/new
Create a new
top-level browsing context
The
remote end steps
, given
session
URL
variables
and
parameters
are:
If the implementation does not support creating new top-level
browsing contexts, return
error
with
error code
unsupported operation
If
session
's
current top-level browsing
context
is
no longer open
, return
error
with
error code
no such window
Try
to
handle any user prompts
with
session
Let
type hint
be the result of
getting the property
type
" from
parameters
Create a new
top-level browsing context
by running
the
window open steps
with
URL
set to
about:blank
",
target
set to the empty
string, and
features
set to "
noopener
" and
the user agent configured to create a new browsing context. This must
be done without invoking the
focusing steps
for the created browsing
context. If
type hint
has the value "
tab
",
and the implementation supports multiple browsing context in the
same OS window, the new browsing context should share an OS window
with
session
's
current browsing context
. If
type hint
is
window
", and the implementation supports multiple
browsing contexts in separate OS windows, the created browsing
context should be in a new OS window. In all other cases the details
of how the browsing context is presented to the user are
implementation defined.
Let
handle
be the
associated
window handle
of the newly created window.
Let
type
be "
tab
" if the newly created
window shares an OS-level window with
session
's
current browsing
context
, or "
window
" otherwise.
Let
result
be a new JSON
Object
initialized with:
handle
The value of
handle
type
The value of
type
Return
success
with data
result
11.6
Switch To Frame
HTTP Method
URI Template
POST
/session/{
session id
}/frame
Note
The
Switch To Frame
command is used to select
session
's
current top-level browsing context
or
child browsing context
of
session
's
current
browsing context
to use as
session
's
current
browsing context
for subsequent
commands

The
remote end steps
, given
session
URL
variables
and
parameters
are:
Let
id
be the result of
getting the property
id
from
parameters
If
id
is not
null
Number
object,
or an
Object
that
represents a web element
return
error
with
error code
invalid argument
Run the substeps of the first matching condition:
id
is
null
If
session
's
current top-level browsing
context
is
no longer open
, return
error
with
error code
no such window
Try
to
handle any user prompts
with
session
Set the current browsing context
with
session
and
session
's
current
top-level browsing context
id
is a
Number
object
If
id
is less than 0 or greater than 2
16
– 1,
return
error
with
error code
invalid argument
If
session
's
current browsing context
is
no
longer open
, return
error
with
error code
no
such window
Try
to
handle any user prompts
with
session
Let
window
be the
associated window
of
session
's
current browsing context
's
active document
If
id
is not
supported property index
of
window
return
error
with
error code
no such frame
Let
child window
be
the
WindowProxy
object obtained by
calling
window
[[GetOwnProperty]]
id
).
Set the current browsing context
with
session
and
child window
's
browsing context
id
represents a web element
If
session
's
current browsing context
is
no
longer open
, return
error
with
error code
no
such window
Try
to
handle any user prompts
with
session
Let
element
be the result
of
trying
to
get a known element
with
session
and
id
If
element
is not a
frame
or
iframe
element,
return
error
with
error code
no such frame
Set the current browsing context
with
session
and
element
's
content navigable
's
active browsing context
Update any implementation-specific state that would result
from the user selecting
session
's
current browsing context
for
interaction, without altering OS-level focus.
Return
success
with data
null
Note
WebDriver is not bound by the same origin policy,
so it is always possible to switch into child browsing contexts,
even if they are different origin to the current browsing context.
11.7
Switch To Parent Frame
HTTP Method
URI Template
POST
/session/{
session id
}/frame/parent
Note
The
Switch to Parent Frame
command
sets
session
's
current browsing context
for future
commands
to the parent of
session
's
current browsing context
The
remote end steps
, given
session
URL
variables
and
parameters
are:
If
session
's
current browsing context
is already the
top-level browsing context
If
session
's
current browsing context
is
no longer open
, return
error
with
error
code
no such window
Return
success
with data
null
If
session
's
current parent browsing context
is
no longer open
, return
error
with
error
code
no such window
Try
to
handle any user prompts
with
session
If
session
's
current parent browsing context
is
not
null
set the current browsing context
with
session
and
current parent browsing context
Update any implementation-specific state that would result
from the user selecting
session
's
current browsing context
for
interaction, without altering OS-level focus.
Return
success
with data
null
11.8
Resizing and positioning windows
WebDriver provides
commands
for interacting with the operating system window
containing
session
's
current top-level browsing context
Because different operating systems' window managers provide different abilities,
not all of the commands in this section can be supported by all
remote ends
Support for these
commands
is determined by the
window
dimensioning/positioning
capability
Where a
command
is not supported,
an
unsupported operation
error
is returned.
The
top-level browsing context
has an associated
window state
which describes what visibility state its OS widget window is in.
It can be in one of the following states:
State
Keyword
Default
Description
Maximized window state
maximized
The window is maximized.
Minimized window state
minimized
The window is iconified.
Normal window state
normal
The window is shown normally.
Fullscreen window state
fullscreen
The window is in full screen mode.
If for whatever reason the
top-level browsing context
's
OS window cannot enter either of the
window states
, or if this
concept is not applicable on the current system, the default state
must be
normal
The
WindowRect object
for
WindowProxy
window
is an
Object
initialized
with the following properties:
window
's
screenX
attribute.
window
's
screenY
attribute.
width
windows
's
outerWidth
attribute.
height
window
's
outerHeight
attribute.
To
maximize the window
given an operating system level window
with an associated
top-level browsing context
run the implementation-specific steps
to transition the operating system level window
into the
maximized window state
If the window manager supports window resizing
but does not have a concept of window maximization,
the window dimensions must be increased
to the maximum available size
permitted by the window manager
for the current screen.
Return when the window has completed the transition,
or within an implementation-defined timeout.
To
iconify the window
given an operating system level window
with an associated
top-level browsing context
run implementation-specific steps
to transition the operating system level window
into the
minimized window state
Do not return from this operation
until the
visibility state
of the
top-level browsing context
's
active document
has reached the
hidden
state,
or until the operation times out.
To
restore the window
given an operating system level window
with an associated
top-level browsing context
run implementation-specific steps
to restore or unhide the window
to the visible screen.
Do not return from this operation
until the
visibility state
of the
top-level browsing context
's
active document
has reached the
visible
state,
or until the operation times out.
MDN
Commands/GetWindowRect
Chrome
65+
Chrome Android
No
Edge
Edge Mobile
Firefox
55+
Firefox Android
No
Opera
No
Opera Android
Safari
No
Safari iOS
Samsung Internet
No
WebView Android
11.8.1
Get Window Rect
HTTP Method
URI Template
GET
/session/{session id}/window/rect
Note
The
Get Window Rect
command
returns the size and position on the screen
of the operating system window corresponding
to
session
's
current top-level browsing context
The
remote end steps
, given
session
URL
variables
and
parameters
are:
If
session
's
current top-level browsing context
is
no longer open
return
error
with
error code
no such window
Try
to
handle any user prompts
with
session
Return
success
with data set to the
WindowRect
object
for the
session
's
current top-level browsing
context
MDN
Commands/SetWindowRect
Chrome
65+
Chrome Android
No
Edge
Edge Mobile
Firefox
55+
Firefox Android
No
Opera
No
Opera Android
Safari
No
Safari iOS
Samsung Internet
No
WebView Android
11.8.2
Set Window Rect
HTTP Method
URI Template
POST
/session/{
session id
}/window/rect
Note
The
Set Window Rect
command
alters the size and the position of the operating system window
corresponding to
session
's
current top-level browsing context
The
remote end steps
, given
session
URL
variables
and
parameters
are:
Let
width
be the result of
getting a property
named "
width
from
parameters
If
width
is
undefined
, let
width
be null.
Let
height
be the result of
getting a property
named "
height
from
parameters
If
height
is
undefined
, let
height
be null.
Let
be the result of
getting a property
named "
" from
parameters
If
is
undefined
, let
be null.
Let
be the result of
getting a property
named "
" from
parameters
If
is
undefined
, let
be null.
If
width
or
height
is neither null, nor
Number
from 0 to 2
31
− 1, return
error
with
error code
invalid argument
If
or
is neither null, nor
Number
from −(2
31
) to 2
31
− 1,
return
error
with
error code
invalid argument
If the
remote end
does not support
the
Set Window Rect
command
for
session
's
current top-level browsing context
for any reason, return
error
with
error
code
unsupported operation
Note
In case the
Set Window Rect
command is
partially supported (i.e. some combinations of arguments are
supported but not others), the implmentation is expected to continue
with the remaining steps.
If
session
's
current top-level browsing
context
is
no longer open
, return
error
with
error code
no such window
Try
to
handle any user prompts
with
session
Fully exit fullscreen
Restore the window
Let
window
be the operating system window containing
session
's
current top-level browsing context
If the implementation is able to set the dimensions
of
window
If
width
is not null, set the width,
in
CSS pixels
, of
window
, including
any
browser chrome
and externally drawn window decorations,
to a value that is as close as possible to
width
If
height
is not null, set the height,
in
CSS pixels
, of
window
, including
any
browser chrome
and externally drawn window decorations,
to a value that is as close as possible to
height
Note
The specification does not guarantee
that the resulting window size will exactly match that which was requested.
In particular the implementation is expected to clamp values
that are larger than the physical screen dimensions,
or smaller than the minimum window size.
Particular implementations may have other limitations
such as not being able to resize in single-pixel increments.
This is intended to mutate the value
of
session
's
current top-level browsing
context
's
WindowProxy
's
outerWidth
and
outerHeight
properties. Specifically, the value of
outerWidth
should
be as close as possible to
width
and the value
of
outerHeight
should be as close as possible
to
height
If the implementation is able to set the position of
window
If
is not null, set the x-coordinate of the
left edge of
window
to a value that is as close as
possible to
If
is not null, set the y-coordinate of the top
edge of
window
to a value that is as close as possible
to
Note
The specification does not guarantee
that the resulting window position will match that which was requested.
This step is similar to calling the
moveTo(x,
y)
method on the
WindowProxy
object
associated with
session
's
current top-level browsing
context
, but without the
security
restrictions
that you
cannot move a window or tab that was not created by
window.open
cannot move a window or tab when it is in a window with more than one tab.
Return
success
with data set to the
WindowRect
object
for the
session
's
current top-level browsing
context
11.8.3
Maximize Window
HTTP Method
URI Template
POST
/session/{
session id
}/window/maximize
Note
The
Maximize Window
command invokes the window
manager-specific “maximize” operation, if any, on the window
containing
session
's
current top-level browsing
context
. This typically increases the window to the maximum
available size without going full-screen.
The
remote end steps
, given
session
URL
variables
and
parameters
are:
If the
remote end
does not support the
Maximize
Window
command for
session
's
current top-level
browsing context
for any reason, return
error
with
error code
unsupported operation
If
session
's
current top-level browsing
context
is
no longer open
, return
error
with
error code
no such window
Try
to
handle any user prompts
with
session
Fully exit fullscreen
Restore the window
Maximize the window
of
session
's
current top-level browsing context
Return
success
with data set to the
WindowRect
object
for the
session
's
current top-level browsing
context
11.8.4
Minimize Window
HTTP Method
URI Template
POST
/session/{
session id
}/window/minimize
Note
The
Minimize Window
command invokes the window
manager-specific “minimize” operation, if any, on the window
containing
session
's
current top-level browsing
context
. This typically hides the window in the system tray.
The
remote end steps
, given
session
URL
variables
and
parameters
are:
If the
remote end
does not support the
Minimize
Window
command for
session
's
current top-level
browsing context
for any reason, return
error
with
error code
unsupported operation
If
session
's
current top-level browsing
context
is
no longer open
, return
error
with
error code
no such window
Try
to
handle any user prompts
with
session
Fully exit fullscreen
Iconify the window
Return
success
with data set to the
WindowRect
object
for the
session
's
current top-level browsing
context
11.8.5
Fullscreen Window
HTTP Method
URI Template
POST
/session/{
session id
}/window/fullscreen
The
remote end steps
, given
session
URL
variables
and
parameters
are:
If the
remote end
does not
support fullscreen
return
error
with
error code
unsupported operation
If
session
's
current top-level browsing
context
is
no longer open
, return
error
with
error code
no such window
Try
to
handle any user prompts
with
session
Restore the window
Call
fullscreen an element
with
session
's
current top-level browsing context
's
active document
's
document element
Note
The window is now in the
Fullscreen window state
Return
success
with data set to the
WindowRect
object
for the
session
's
current top-level browsing
context
12.
Elements
web element
is an abstraction used to identify an
element
when it is transported via the
protocol
between
remote
and
local
ends.
The
web element identifier
is the string constant
element-6066-11e4-a52e-4f735466cecf
".
An ECMAScript
Object
represents a web element
if it has a
web element identifier
own property
The
WebDriver node id
is a globally unique string
representing a handle to a DOM node in a specific
WebDriver
session
weak map
is a
map
in which keys are held
weakly i.e. items are removed if the key object is garbaged collected, and
presence in the map does not prevent garbage collection. This acts as an
alternative to defining properties directly on the key objects.
Note
Unlike the ECMAScript
WeakMap
weak map
can participate in the full set of operations available for
a Map.
A WebDriver
session
has a
browsing context group node
map
, which is a
weak map
between a
browsing context group
and a
node id map
node id map
is
weak map
between nodes and their
corresponding
WebDriver node id
A WebDriver
session
has a
navigable seen nodes map
which is a
weak map
between a
navigable
and a set.
To
get a node
given
session
browsing context
, and
reference
Let
browsing context group node map
be
session
's
browsing context group node map
Let
browsing context group
be
browsing
context
's
browsing context group
If
browsing context group node map
does not
contain
browsing context group
, return null.
Let
node id map
be
browsing context group node
map
browsing context group
].
Let
node
be the entry in
node id map
whose
value is
reference
, if such an entry exists, or null
otherwise.
Return
node
To
get or create a node reference
given
session
browsing context
, and
node
Let
browsing context group node map
be
session
's
browsing context group node map
Let
browsing context group
be
browsing
context
's
browsing context group
If
browsing context group node map
does not
contain
browsing context group
, set
browsing context
group node map
browsing context group
] to a new
weak map
Let
node id map
be
browsing context group node
map
browsing context group
].
If
node id map
does not contain
node
Let
node id
be a new globally unique string.
Set
node id map
node
] to
node id
Let
navigable
be
browsing
context
's
active document
's
node navigable
Let
navigable seen nodes map
be
session
's
navigable seen nodes map
If
navigable seen nodes map
does not
contain
navigable
, set
navigable seen nodes
map
navigable
] to an empty set.
Append
node id
to
navigable seen nodes
map
navigable
].
Return
node id map
node
].
node reference is known
given
session
browsing
context
, and
reference
if the following steps return true:
Let
navigable
be
browsing
context
's
active document
's
node navigable
Let
navigable seen nodes map
be
session
's
navigable seen nodes map
If
navigable seen nodes map
contains
navigable
and
navigable seen nodes
map
navigable
contains
reference
return true, otherwise return false.
To
get a known element
given
session
and
reference
If not
node reference is known
with
session
session
's
current browsing context
and
reference
return
error
with
error code
no such element
Let
node
be the result of
get a node
with
session
session
's
current browsing
context
, and
reference
If
node
is not null and
node
does not implement
Element
return
error
with
error code
no such element
If
node
is null or
node
is stale
return
error
with
error code
stale element reference
Return
success
with data
node
To
get or create a web element reference
given
session
and
element
Assert:
element
implements
Element
Return the result of
trying
to
get or create a node
reference
given
session
session
's
current
browsing context
, and
element
The
web element reference object
for
session
and
element
is:
Let
identifier
be the
web element identifier
Let
reference
be the result of
get or create a
web element reference
with
session
and
element
Return a JSON
Object
initialized with a property with
name
identifier
and value
reference
To
deserialize a web element
by a
JSON
Object
object
that
represents a web
element
If
object
has no
own property
web element identifier
return
error
with
error code
invalid argument
Let
reference
be the result of
getting
the
web element
identifier
property from
object
If
reference
is not a
String
return an
error
with
error code
invalid argument
Let
element
be the result
of
trying
to
get a known element
with
session
and
reference
Return
success
with data
element
An
element
is stale
if its
node document
is not the
active document
or if it is not
connected
To
scroll into view
an
element
perform the following steps
only if the element is not already
in view
Let
options
be
the following
ScrollIntoViewOptions
behavior
instant
Logical scroll position "
block
end
Logical scroll position "
inline
nearest
Run
Function.[[Call]]
scrollIntoView
options
with
element
as the this value.
Editable
elements
are those that can be used for
typing
and
clearing
and they fall into two subcategories:
Mutable form control elements
Denotes
input
elements
that are
mutable
(e.g. that are not
read only
or
disabled
) and whose
type
attribute
is in one of the following states:
Text and Search
URL
Telephone
Email
Password
Date
Month
Week
Time
Local Date and Time
Number
Range
Color
File Upload
And the
textarea
element.
Mutable elements
Denotes elements that are
editing hosts
or
content editable
An
element
is said to have
pointer events disabled
if the
resolved value
of its "
pointer-events
" style property
is "
none
".
An
element
is to be considered
read only
if it is an
input
element
whose
readonly
attribute is set.
12.1
Interactability
In order to determine if an
element
can be interacted with using pointer actions,
WebDriver performs hit-testing to find
if the interaction will be able to reach the requested element.
An
interactable element
is an
element
which is either
pointer-interactable
or
keyboard-interactable
pointer-interactable element
is defined to be the first
element
defined by the
paint order
found at the
center point
of its rectangle that is inside the
viewport
excluding the size of any rendered scrollbars.
keyboard-interactable element
is any
element
that has a
focusable area
is a
body
element,
or is the
document element
An
element
's
in-view center point
is the origin position of the rectangle
that is the intersection between
the element's first
DOMRect
of
getClientRects
()
and the
initial viewport
It can be calculated this way:
Let
rectangle
be
the first object of the
DOMRect
collection
returned by calling
getClientRects
()
on
element
Let
left
be
max
(0,
min
x coordinate
x coordinate
width dimension
)).
Let
right
be
min
innerWidth
max
x coordinate
x coordinate
width dimension
)).
Let
top
be
max
(0,
min
y coordinate
y coordinate
height dimension
)).
Let
bottom
be
min
innerHeight
max
y coordinate
y coordinate
height dimension
)).
Let
be
floor
((
left
right
) ÷ 2.0).
Let
be
floor
((
top
bottom
) ÷ 2.0).
Return the pair of (
).
An
element
element
is
disabled
if the following steps
return true:
If
element
is an
option
element or
element
is an
optgroup
element:
For each
inclusive ancestor
ancestor
of
element
If
ancestor
is an
optgroup
element or
ancestor
is a
select
element, and
ancestor
is
actually disabled
, return true.
Return false.
Return
element
is
actually disabled
An
element
is
in view
if it is a member of its own
pointer-interactable paint tree
given the pretense that its
pointer events are not disabled
An
element
is
obscured
if the
pointer-interactable paint tree
at its
center point
is empty,
or the first element in this tree
is not an
inclusive descendant
of itself.
Example
10
This ascertains if the
element
's
in-view center point
would be possible to
interact
with.
For example, the
paint tree
at this button's
center point
, the red square, is not itself the button or
descendant
of the button. In other words, it is not
an
inclusive descendant
. This makes the
button
obscured
On the other hand, the
center point
of the following select
list is the third
option
element, because unlike
a drop-down list,