# urllib

* urllib是Python自带的爬虫库
* 常用urllib.reqeust, urllib.parse

使用流程：

* 指定url
* 基于urllib的request子模块发起请求
* 获取响应中的数据值
* 持久化存储

### 代理

* 正向代理：代理客户端获取数据。正向代理是为了保护客户端防止被追究责任。
* 反向代理：代理服务器提供数据。反向代理是为了保护服务器或负责负载均衡。

{% hint style="info" %}
示例代码：<https://github.com/ni-ning/LearnPython/blob/master/29Spider/001urllib.ipynb>
{% endhint %}

```python
from urllib.parse import *

quote('abc def')       --> 'abc%20edf'
unquote('abc%20edf')   --> 'abc def'

# Parse a URL into 6 components
# <scheme>://<netloc>/<path>;<params>?<query>#<fragment>
urlparse(''http://www.baidu.com/path?key=value#comments'') --> ParseResult(scheme='http', netloc='www.baidu.com', path='/path', params='', query='key=value', fragment='comments')
urlunparse(components) --> url

# urlsplit(url) --> Parse a URL into 5 components but params
# urlunsplit(components) --> url


parse_qs(query)   --> obj
parse_qsl(query)  --> dict
urlencode(query_dict) --> query_str
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://nining.website/python/spider/urllib.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.