Public API

The only public yarl class is URL:

>>> from yarl import URL
class yarl.URL(arg, *, encoded=False)[source]

Represents URL as

[scheme:]//[user[:password]@]host[:port][/path][?query][#fragment]

for absolute URLs and

[/path][?query][#fragment]

for relative ones (Absolute and relative URLs).

The URL structure is:

 http://user:pass@example.com:8042/over/there?name=ferret#nose
 \__/   \__/ \__/ \_________/ \__/\_________/ \_________/ \__/
  |      |    |        |       |      |           |        |
scheme  user password host    port   path       query   fragment

Internally all data are stored as percent-encoded strings for user, path, query and fragment URL parts and IDNA-encoded (RFC 5891) for host.

Constructor and modification operators perform encoding for all parts automatically. The library assumes all data uses UTF-8 for percent-encoded tokens.

>>> URL('http://example.com/path/to/?arg1=a&arg2=b#fragment')
URL('http://example.com/path/to/?arg1=a&arg2=b#fragment')

Unless URL contain the only ascii characters there is no differences.

But for non-ascii case encoding is applied.

>>> str(URL('http://εμπορικόσήμα.eu/путь/這裡'))
'http://xn--jxagkqfkduily1i.eu/%D0%BF%D1%83%D1%82%D1%8C/%E9%80%99%E8%A3%A1'

The same is true for user, password, query and fragment parts of URL.

Already encoded URL is not changed:

>>> URL('http://xn--jxagkqfkduily1i.eu')
URL('http://xn--jxagkqfkduily1i.eu')

Use URL.human_repr() for getting human readable representation:

>>> url = URL('http://εμπορικόσήμα.eu/путь/這裡')
>>> str(url)
'http://xn--jxagkqfkduily1i.eu/%D0%BF%D1%83%D1%82%D1%8C/%E9%80%99%E8%A3%A1'
>>> url.human_repr()
'http://εμπορικόσήμα.eu/путь/這裡'

Note

Sometimes encoding performed by yarl is not acceptable for certain WEB server.

Passing encoded=True parameter prevents URL autoencoding, user is responsible about URL correctness.

Don’t use this option unless there is no other way for keeping URL attributes not touched.

Any URL manipulations don’t guarantee correct encoding, URL parts could be requoted even if encoded parameter was explicitly set.

URL properties

There are two kinds of properties: decoded and encoded (with raw_ prefix):

URL.scheme

Scheme for absolute URLs, empty string for relative URLs or URLs starting with ‘//’ (Absolute and relative URLs).

>>> URL('http://example.com').scheme
'http'
>>> URL('//example.com').scheme
''
>>> URL('page.html').scheme
''
URL.user[source]

Decoded user part of URL, None if user is missing.

>>> URL('http://john@example.com').user
'john'
>>> URL('http://андрей@example.com').user
'андрей'
>>> URL('http://example.com').user is None
True
URL.raw_user

Encoded user part of URL, None if user is missing.

>>> URL('http://андрей@example.com').raw_user
'%D0%B0%D0%BD%D0%B4%D1%80%D0%B5%D0%B9'
>>> URL('http://example.com').raw_user is None
True
URL.password[source]

Decoded password part of URL, None if user is missing.

>>> URL('http://john:pass@example.com').password
'pass'
>>> URL('http://андрей:пароль@example.com').password
'пароль'
>>> URL('http://example.com').password is None
True
URL.raw_password

Encoded password part of URL, None if user is missing.

>>> URL('http://user:пароль@example.com').raw_password
'%D0%BF%D0%B0%D1%80%D0%BE%D0%BB%D1%8C'
URL.host[source]

Encoded host part of URL, None for relative URLs (Absolute and relative URLs).

Brackets are stripped for IPv6. Host is converted to lowercase, address is validated and converted to compressed form.

>>> URL('http://example.com').host
'example.com'
>>> URL('http://хост.домен').host
'хост.домен'
>>> URL('page.html').host is None
True
>>> URL('http://[::1]').host
'::1'
URL.raw_host

IDNA decoded host part of URL, None for relative URLs (Absolute and relative URLs).

>>> URL('http://хост.домен').raw_host
'xn--n1agdj.xn--d1acufc'
URL.port

port part of URL.

None for relative URLs (Absolute and relative URLs) or for URLs without explicit port and URL.scheme without default port substitution.

>>> URL('http://example.com:8080').port
8080
>>> URL('http://example.com').port
80
>>> URL('page.html').port is None
True
URL.path[source]

Decoded path part of URL, '/' for absolute URLs without path part.

>>> URL('http://example.com/path/to').path
'/path/to'
>>> URL('http://example.com/путь/сюда').path
'/путь/сюда'
>>> URL('http://example.com').path
'/'
URL.path_qs[source]

Decoded path part of URL and query string, '/' for absolute URLs without path part.

>>> URL('http://example.com/path/to?a1=a&a2=b').path_qs
'/path/to?a1=a&a2=b'
URL.raw_path_qs[source]

Encoded path part of URL and query string, '/' for absolute URLs without path part.

>>> URL('http://example.com/путь/сюда?ключ=знач').raw_path_qs
'/%D0%BF%D1%83%D1%82%D1%8C/%D1%81%D1%8E%D0%B4%D0%B0?%D0%BA%D0%BB%D1%8E%D1%87=%D0%B7%D0%BD%D0%B0%D1%87'

New in version 0.15.

URL.raw_path

Encoded path part of URL, '/' for absolute URLs without path part.

>>> URL('http://example.com/путь/сюда').raw_path
'/%D0%BF%D1%83%D1%82%D1%8C/%D1%81%D1%8E%D0%B4%D0%B0'
URL.query_string[source]

Decoded query part of URL, empty string if query is missing.

>>> URL('http://example.com/path?a1=a&a2=b').query_string
'a1=a&a2=b'
>>> URL('http://example.com/path?ключ=знач').query_string
'ключ=знач'
>>> URL('http://example.com/path').query_string
''
URL.raw_query_string

Encoded query part of URL, empty string if query is missing.

>>> URL('http://example.com/path?ключ=знач').raw_query_string
'%D0%BA%D0%BB%D1%8E%D1%87=%D0%B7%D0%BD%D0%B0%D1%87'
URL.fragment[source]

Encoded fragment part of URL, empty string if fragment is missing.

>>> URL('http://example.com/path#fragment').fragment
'fragment'
>>> URL('http://example.com/path#якорь').fragment
'якорь'
>>> URL('http://example.com/path').fragment
''
URL.raw_fragment

Decoded fragment part of URL, empty string if fragment is missing.

>>> URL('http://example.com/path#якорь').raw_fragment
'%D1%8F%D0%BA%D0%BE%D1%80%D1%8C'

For path and query yarl supports additional helpers:

URL.parts[source]

A tuple containing decoded path parts, ('/',) for absolute URLs if path is missing.

>>> URL('http://example.com/path/to').parts
('/', 'path', 'to')
>>> URL('http://example.com/путь/сюда').parts
('/', 'путь', 'сюда')
>>> URL('http://example.com').parts
('/',)
URL.raw_parts[source]

A tuple containing encoded path parts, ('/',) for absolute URLs if path is missing.

>>> URL('http://example.com/путь/сюда').raw_parts
('/', '%D0%BF%D1%83%D1%82%D1%8C', '%D1%81%D1%8E%D0%B4%D0%B0')
URL.name[source]

The last part of parts.

>>> URL('http://example.com/path/to').name
'to'
>>> URL('http://example.com/путь/сюда').name
'сюда'
>>> URL('http://example.com/path/').name
''
URL.raw_name[source]

The last part of raw_parts.

>>> URL('http://example.com/путь/сюда').raw_name
'%D1%81%D1%8E%D0%B4%D0%B0'
URL.query[source]

A multidict.MultiDictProxy representing parsed query parameters in decoded representation. Empty value if URL has no query part.

>>> URL('http://example.com/path?a1=a&a2=b').query
<MultiDictProxy('a1': 'a', 'a2': 'b')>
>>> URL('http://example.com/path?ключ=знач').query
<MultiDictProxy('ключ': 'знач')>
>>> URL('http://example.com/path').query
<MultiDictProxy()>

Absolute and relative URLs

The module supports both absolute and relative URLs.

Absolute URL should start from either scheme or '//'.

URL.is_absolute()[source]

A check for absolute URLs.

Return True for absolute ones (having scheme or starting with //), False otherwise.

>>> URL('http://example.com').is_absolute()
True
>>> URL('//example.com').is_absolute()
True
>>> URL('/path/to').is_absolute()
False
>>> URL('path').is_absolute()
False

New URL generation

URL is an immutable object, every operation described in the section generates a new URL instance.

URL.build(*, scheme, user, password, host, port, path, query, query_string, fragment, strict=False)[source]

Creates and returns a new URL:

>>> URL.build(scheme="http", host="example.com")
URL('http://example.com')

>>> URL.build(scheme="http", host="example.com", query={"a": "b"})
URL('http://example.com/?a=b')

>>> URL.build(scheme="http", host="example.com", query_string="a=b")
URL('http://example.com/?a=b')

>>> URL.build()
URL('')

Calling build method without arguments is equal to calling __init__ without arguments.

Note

When scheme and host are passed new URL will be “absolute”. If only one of scheme or host is passed then AssertionError will be raised.

Note

Only one of query or query_string should be passed then AssertionError will be raised.

URL.with_scheme(scheme)[source]

Return a new URL with scheme replaced:

>>> URL('http://example.com').with_scheme('https')
URL('https://example.com')
URL.with_user(user)[source]

Return a new URL with user replaced, autoencode user if needed.

Clear user/password if user is None.

>>> URL('http://user:pass@example.com').with_user('new_user')
URL('http://new_user:pass@example.com')
>>> URL('http://user:pass@example.com').with_user('вася')
URL('http://%D0%B2%D0%B0%D1%81%D1%8F:pass@example.com')
>>> URL('http://user:pass@example.com').with_user(None)
URL('http://example.com')
URL.with_password(password)[source]

Return a new URL with password replaced, autoencode password if needed.

Clear password if None is passed.

>>> URL('http://user:pass@example.com').with_password('пароль')
URL('http://user:%D0%BF%D0%B0%D1%80%D0%BE%D0%BB%D1%8C@example.com')
>>> URL('http://user:pass@example.com').with_password(None)
URL('http://user@example.com')
URL.with_host(host)[source]

Return a new URL with host replaced, autoencode host if needed.

Changing host for relative URLs is not allowed, use URL.join() instead.

>>> URL('http://example.com/path/to').with_host('python.org')
URL('http://python.org/path/to')
>>> URL('http://example.com/path').with_host('хост.домен')
URL('http://xn--n1agdj.xn--d1acufc/path')
URL.with_port(port)[source]

Return a new URL with port replaced.

Clear port to default if None is passed.

>>> URL('http://example.com:8888').with_port(9999)
URL('http://example.com:9999')
>>> URL('http://example.com:8888').with_port(None)
URL('http://example.com')
URL.with_path(path)[source]

Return a new URL with path replaced, encode path if needed.

>>> URL('http://example.com/').with_path('/path/to')
URL('http://example.com/path/to')
URL.with_query(query)[source]
URL.with_query(**kwargs)

Return a new URL with query part replaced.

Unlike update_query() the method replaces all query parameters.

Accepts any Mapping (e.g. dict, MultiDict instances) or str, autoencode the argument if needed.

A sequence of (key, value) pairs is supported as well.

Also it can take an arbitrary number of keyword arguments.

Clear query if None is passed.

>>> URL('http://example.com/path?a=b').with_query('c=d')
URL('http://example.com/path?c=d')
>>> URL('http://example.com/path?a=b').with_query({'c': 'd'})
URL('http://example.com/path?c=d')
>>> URL('http://example.com/path?a=b').with_query({'кл': 'зн'})
URL('http://example.com/path?%D0%BA%D0%BB=%D0%B7%D0%BD')
>>> URL('http://example.com/path?a=b').with_query(None)
URL('http://example.com/path')
>>> URL('http://example.com/path?a=b&b=1').with_query(b='2')
URL('http://example.com/path?b=2')
>>> URL('http://example.com/path?a=b&b=1').with_query([('b', '2')])
URL('http://example.com/path?b=2')
URL.update_query(query)[source]
URL.update_query(**kwargs)

Returns a new URL with query part updated.

Unlike with_query() the method does not replace query completely.

Returned URL object will contain query string which updated parts from passed query parts (or parts of parsed query string).

Accepts any Mapping (e.g. dict, MultiDict instances) or str, autoencode the argument if needed.

A sequence of (key, value) pairs is supported as well.

Also it can take an arbitrary number of keyword arguments.

Clear query if None is passed.

>>> URL('http://example.com/path?a=b').update_query('c=d')
URL('http://example.com/path?a=b&c=d')
>>> URL('http://example.com/path?a=b').update_query({'c': 'd'})
URL('http://example.com/path?a=b&c=d')
>>> URL('http://example.com/path?a=b').update_query({'кл': 'зн'})
URL('http://example.com/path?a=b&%D0%BA%D0%BB=%D0%B7%D0%BD')
>>> URL('http://example.com/path?a=b&b=1').update_query(b='2')
URL('http://example.com/path?a=b&b=2')
>>> URL('http://example.com/path?a=b&b=1').update_query([('b', '2')])
URL('http://example.com/path?a=b&b=2')
>>> URL('http://example.com/path?a=b&c=e&c=f').update_query(c='d')
URL('http://example.com/path?a=b&c=d')
>>> URL('http://example.com/path?a=b').update_query('c=d&c=f')
URL('http://example.com/path?a=b&c=d&c=f')

Changed in version 1.0: All multiple key/value pairs are applied to the multi-dictionary.

URL.with_fragment(port)[source]

Return a new URL with fragment replaced, autoencode fragment if needed.

Clear fragment to default if None is passed.

>>> URL('http://example.com/path#frag').with_fragment('anchor')
URL('http://example.com/path#anchor')
>>> URL('http://example.com/path#frag').with_fragment('якорь')
URL('http://example.com/path#%D1%8F%D0%BA%D0%BE%D1%80%D1%8C')
>>> URL('http://example.com/path#frag').with_fragment(None)
URL('http://example.com/path')
URL.with_name(name)[source]

Return a new URL with name (last part of path) replaced and cleaned up query and fragment parts.

Name is encoded if needed.

>>> URL('http://example.com/path/to?arg#frag').with_name('new')
URL('http://example.com/path/new')
>>> URL('http://example.com/path/to').with_name('имя')
URL('http://example.com/path/%D0%B8%D0%BC%D1%8F')
URL.parent[source]

A new URL with last part of path removed and cleaned up query and fragment parts.

>>> URL('http://example.com/path/to?arg#frag').parent
URL('http://example.com/path')
URL.origin()[source]

A new URL with scheme, host and port parts only. user, password, path, query and fragment are removed.

>>> URL('http://example.com/path/to?arg#frag').origin()
URL('http://example.com')
>>> URL('http://user:pass@example.com/path').origin()
URL('http://example.com')
URL.relative()[source]

A new relative URL with path, query and fragment parts only. scheme, user, password, host and port are removed.

>>> URL('http://example.com/path/to?arg#frag').relative()
URL('/path/to?arg#frag')

Division (/) operator creates a new URL with appended path parts and cleaned up query and fragment parts.

The path is encoded if needed.

>>> url = URL('http://example.com/path?arg#frag') / 'to/subpath'
>>> url
URL('http://example.com/path/to/subpath')
>>> url.parts
('/', 'path', 'to', 'subpath')
>>> url = URL('http://example.com/path?arg#frag') / 'сюда'
>>> url
URL('http://example.com/path/%D1%81%D1%8E%D0%B4%D0%B0')
URL.join(url)[source]

Construct a full (“absolute”) URL by combining a “base URL” (self) with another URL (url). Informally, this uses components of the base URL, in particular the addressing scheme, the network location and (part of) the path, to provide missing components in the relative URL, e.g.:

>>> base = URL('http://example.com/path/index.html')
>>> base.join(URL('page.html'))
URL('http://example.com/path/page.html')

Note

If url is an absolute URL (that is, starting with // or scheme://), the url‘s host name and/or scheme will be present in the result, e.g.:

>>> base = URL('http://example.com/path/index.html')
>>> base.join(URL('//python.org/page.html'))
URL('http://python.org/page.html')

Human readable representation

All URL data is stored in encoded form internally. It’s pretty good for passing str(url) everywhere url string is accepted but quite bad for memorizing by humans.

yarl.human_repr()

Return decoded human readable string for URL representation.

>>> url = URL('http://εμπορικόσήμα.eu/這裡')
>>> str(url)
'http://xn--jxagkqfkduily1i.eu/%E9%80%99%E8%A3%A1'
>>> url.human_repr()
'http://εμπορικόσήμα.eu/這裡'

Default port substitution

yarl is aware about the following scheme -> port translations:

scheme port
'http' 80
'https' 443
'ws' 80
'wss' 443
URL.is_default_port()[source]

A check for default port.

Return True if URL’s port is default for used scheme, False otherwise.

Relative URLs have no default port.

>>> URL('http://example.com').is_default_port()
True
>>> URL('http://example.com:80').is_default_port()
True
>>> URL('http://example.com:8080').is_default_port()
False
>>> URL('/path/to').is_default_port()
False

References

yarl stays on shoulders of giants: several RFC documents and low-level urllib.parse which performs almost all gory work.

The module borrowed design from pathlib in any place where it was possible.

See also

RFC 5891 - Internationalized Domain Names in Applications (IDNA): Protocol
Document describing non-ascii domain name encoding.
RFC 3987 - Internationalized Resource Identifiers
This specifies conversion rules for non-ascii characters in URL.
RFC 3986 - Uniform Resource Identifiers
This is the current standard (STD66). Any changes to yarl module should conform to this. Certain deviations could be observed, which are mostly for backward compatibility purposes and for certain de-facto parsing requirements as commonly observed in major browsers.
RFC 2732 - Format for Literal IPv6 Addresses in URL’s.
This specifies the parsing requirements of IPv6 URLs.
RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax
Document describing the generic syntactic requirements for both Uniform Resource Names (URNs) and Uniform Resource Locators (URLs).
RFC 2368 - The mailto URL scheme.
Parsing requirements for mailto URL schemes.
RFC 1808 - Relative Uniform Resource Locators This Request For
Comments includes the rules for joining an absolute and a relative URL, including a fair number of “Abnormal Examples” which govern the treatment of border cases.
RFC 1738 - Uniform Resource Locators (URL)
This specifies the formal syntax and semantics of absolute URLs.