source: uri/docs/source/user/validating.rst@ 230

Last change on this file since 230 was 230, checked in by wouter, 4 years ago

#91 clone https://pypi.org/project/rfc3986/

File size: 5.9 KB
RevLine 
[230]1.. _validating:
2
3=================
4 Validating URIs
5=================
6
7While not as difficult as `validating an email address`_, validating URIs is
8tricky. Different parts of the URI allow different characters. Those sets
9sometimes overlap and othertimes they don't and it's not very convenient.
10Luckily, |rfc3986| makes validating URIs far simpler.
11
12
13Example Usage
14=============
15
16First we need to create an instance of a
17:class:`~rfc3986.validators.Validator` which takes no parameters. After that
18we can call methods on the instance to indicate what we want to validate.
19
20Allowing Only Trusted Domains and Schemes
21-----------------------------------------
22
23Let's assume that we're building something that takes user input for a URL and
24we want to ensure that URL is only ever using a specific domain with https. In
25that case, our code would look like this:
26
27>>> from rfc3986 import validators, uri_reference
28>>> user_url = 'https://github.com/sigmavirus24/rfc3986'
29>>> validator = validators.Validator().allow_schemes(
30... 'https',
31... ).allow_hosts(
32... 'github.com',
33... )
34>>> validator.validate(uri_reference(
35... 'https://github.com/sigmavirus24/rfc3986'
36... ))
37>>> validator.validate(uri_reference(
38... 'https://github.com/'
39... ))
40>>> validator.validate(uri_reference(
41... 'http://example.com'
42... ))
43Traceback (most recent call last):
44 ...
45rfc3986.exceptions.UnpermittedComponentError
46
47First notice that we can easily reuse our validator object for each URL.
48This allows users to not have to constantly reconstruct Validators for each
49bit of user input. Next, we have three different URLs that we validate:
50
51#. ``https://github.com/sigmavirus24/rfc3986``
52#. ``https://github.com/``
53#. ``http://example.com``
54
55As it stands, our validator will allow the first two URLs to pass but will
56fail the third. This is specifically because we only allow URLs using
57``https`` as a scheme and ``github.com`` as the domain name.
58
59Preventing Leaks of User Credentials
60------------------------------------
61
62Next, let's imagine that we want to prevent leaking user credentials. In that
63case, we want to ensure that there is no password in the user information
64portion of the authority. In that case, our new validator would look like this:
65
66>>> from rfc3986 import validators, uri_reference
67>>> user_url = 'https://github.com/sigmavirus24/rfc3986'
68>>> validator = validators.Validator().allow_schemes(
69... 'https',
70... ).allow_hosts(
71... 'github.com',
72... ).forbid_use_of_password()
73>>> validator.validate(uri_reference(
74... 'https://github.com/sigmavirus24/rfc3986'
75... ))
76>>> validator.validate(uri_reference(
77... 'https://github.com/'
78... ))
79>>> validator.validate(uri_reference(
80... 'http://example.com'
81... ))
82Traceback (most recent call last):
83 ...
84rfc3986.exceptions.UnpermittedComponentError
85>>> validator.validate(uri_reference(
86... 'https://sigmavirus24@github.com'
87... ))
88>>> validator.validate(uri_reference(
89... 'https://sigmavirus24:not-my-real-password@github.com'
90... ))
91Traceback (most recent call last):
92 ...
93rfc3986.exceptions.PasswordForbidden
94
95Requiring the Presence of Components
96------------------------------------
97
98Up until now, we have assumed that we will get a URL that has the appropriate
99components for validation. For example, we assume that we will have a URL that
100has a scheme and hostname. However, our current validation doesn't require
101those items exist.
102
103>>> from rfc3986 import validators, uri_reference
104>>> user_url = 'https://github.com/sigmavirus24/rfc3986'
105>>> validator = validators.Validator().allow_schemes(
106... 'https',
107... ).allow_hosts(
108... 'github.com',
109... ).forbid_use_of_password()
110>>> validator.validate(uri_reference('//github.com'))
111>>> validator.validate(uri_reference('https:/'))
112
113In the first case, we have a host name but no scheme and in the second we have
114a scheme and a path but no host. If we want to ensure that those components
115are there and that they are *always* what we allow, then we must add one last
116item to our validator:
117
118>>> from rfc3986 import validators, uri_reference
119>>> user_url = 'https://github.com/sigmavirus24/rfc3986'
120>>> validator = validators.Validator().allow_schemes(
121... 'https',
122... ).allow_hosts(
123... 'github.com',
124... ).forbid_use_of_password(
125... ).require_presence_of(
126... 'scheme', 'host',
127... )
128>>> validator.validate(uri_reference('//github.com'))
129Traceback (most recent call last):
130 ...
131rfc3986.exceptions.MissingComponentError
132>>> validator.validate(uri_reference('https:/'))
133Traceback (most recent call last):
134 ...
135rfc3986.exceptions.MissingComponentError
136>>> validator.validate(uri_reference('https://github.com'))
137>>> validator.validate(uri_reference(
138... 'https://github.com/sigmavirus24/rfc3986'
139... ))
140
141
142Checking the Validity of Components
143-----------------------------------
144
145As of version 1.1.0, |rfc3986| allows users to check the validity of a URI
146Reference using a :class:`~rfc3986.validators.Validator`. Along with the above
147examples we can also check that a URI is valid per :rfc:`3986`. The validation
148of the components is pre-determined so all we need to do is specify which
149components we want to validate:
150
151>>> from rfc3986 import validators, uri_reference
152>>> valid_uri = uri_reference('https://github.com/')
153>>> validator = validators.Validator().allow_schemes(
154... 'https',
155... ).allow_hosts(
156... 'github.com',
157... ).forbid_use_of_password(
158... ).require_presence_of(
159... 'scheme', 'host',
160... ).check_validity_of(
161... 'scheme', 'host', 'path',
162... )
163>>> validator.validate(valid_uri)
164>>> invalid_uri = valid_uri.copy_with(path='/#invalid/path')
165>>> validator.validate(invalid_uri)
166Traceback (most recent call last):
167 ...
168rfc3986.exceptions.InvalidComponentsError
169
170Paths are not allowed to contain a ``#`` character unless it's
171percent-encoded. This is why our ``invalid_uri`` raises an exception when we
172attempt to validate it.
173
174
175.. links
176.. _validating an email address:
177 http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx/
Note: See TracBrowser for help on using the repository browser.