source: uri/docs/source/user/parsing.rst@ 230

Last change on this file since 230 was 230, checked in by wouter, 4 years ago

#91 clone https://pypi.org/project/rfc3986/

File size: 3.7 KB
Line 
1===============
2 Parsing a URI
3===============
4
5There are two ways to parse a URI with |rfc3986|
6
7#. :meth:`rfc3986.api.uri_reference`
8
9 This is best when you're **not** replacing existing usage of
10 :mod:`urllib.parse`. This also provides convenience methods around safely
11 normalizing URIs passed into it.
12
13#. :meth:`rfc3986.api.urlparse`
14
15 This is best suited to completely replace :func:`urllib.parse.urlparse`.
16 It returns a class that should be indistinguishable from
17 :class:`urllib.parse.ParseResult`
18
19Let's look at some code samples.
20
21
22Some Examples
23=============
24
25First we'll parse the URL that points to the repository for this project.
26
27.. testsetup:: *
28
29 import rfc3986
30 url = rfc3986.urlparse('https://github.com/sigmavirus24/rfc3986')
31 uri = rfc3986.uri_reference('https://github.com/sigmavirus24/rfc3986')
32
33.. code-block:: python
34
35 url = rfc3986.urlparse('https://github.com/sigmavirus24/rfc3986')
36
37
38Then we'll replace parts of that URL with new values:
39
40.. testcode:: ex0
41
42 print(url.copy_with(
43 userinfo='username:password',
44 port='443',
45 ).unsplit())
46
47.. testoutput:: ex0
48
49 https://username:password@github.com:443/sigmavirus24/rfc3986
50
51This, however, does not change the current ``url`` instance of
52:class:`~rfc3986.parseresult.ParseResult`. As the method name might suggest,
53we're copying that instance and then overriding certain attributes.
54In fact, we can make as many copies as we like and nothing will change.
55
56.. testcode:: ex1
57
58 print(url.copy_with(
59 scheme='ssh',
60 userinfo='git',
61 ).unsplit())
62
63.. testoutput:: ex1
64
65 ssh://git@github.com/sigmavirus24/rfc3986
66
67.. testcode:: ex1
68
69 print(url.scheme)
70
71.. testoutput:: ex1
72
73 https
74
75We can do similar things with URI References as well.
76
77.. code-block:: python
78
79 uri = rfc3986.uri_reference('https://github.com/sigmavirus24/rfc3986')
80
81.. testcode:: ex2
82
83 print(uri.copy_with(
84 authority='username:password@github.com:443',
85 path='/sigmavirus24/github3.py',
86 ).unsplit())
87
88.. testoutput:: ex2
89
90 https://username:password@github.com:443/sigmavirus24/github3.py
91
92However, URI References may have some unexpected behaviour based strictly on
93the RFC.
94
95Finally, if you want to remove a component from a URI, you may pass ``None``
96to remove it, for example:
97
98.. testcode:: ex3
99
100 print(uri.copy_with(path=None).unsplit())
101
102.. testoutput:: ex3
103
104 https://github.com
105
106This will work on both URI References and Parse Results.
107
108
109And Now For Something Slightly Unusual
110======================================
111
112If you are familiar with GitHub, GitLab, or a similar service, you may have
113interacted with the "SSH URL" for some projects. For this project,
114the SSH URL is:
115
116.. code::
117
118 git@github.com:sigmavirus24/rfc3986
119
120
121Let's see what happens when we parse this.
122
123.. code-block:: pycon
124
125 >>> rfc3986.uri_reference('git@github.com:sigmavirus24/rfc3986')
126 URIReference(scheme=None, authority=None,
127 path=u'git@github.com:sigmavirus24/rfc3986', query=None, fragment=None)
128
129There's no scheme present, but it is apparent to our (human) eyes that
130``git@github.com`` should not be part of the path. This is one of the areas
131where :mod:`rfc3986` suffers slightly due to its strict conformance to
132:rfc:`3986`. In the RFC, an authority must be preceded by ``//``. Let's see
133what happens when we add that to our URI
134
135.. code-block:: pycon
136
137 >>> rfc3986.uri_reference('//git@github.com:sigmavirus24/rfc3986')
138 URIReference(scheme=None, authority=u'git@github.com:sigmavirus24',
139 path=u'/rfc3986', query=None, fragment=None)
140
141Somewhat better, but not much.
142
143.. note::
144
145 The maintainers of :mod:`rfc3986` are working to discern better ways to
146 parse these less common URIs in a reasonable and sensible way without
147 losing conformance to the RFC.
Note: See TracBrowser for help on using the repository browser.