Python requests上传文件实现步骤

作者:冷冰若水 时间:2023-09-25 15:14:50 

官方文档:https://2.python-requests.org//en/master/

工作中涉及到一个功能,需要上传附件到一个接口,接口参数如下:

使用http post提交附件 multipart/form-data 格式,url : http://test.com/flow/upload,


字段列表:
md5:      //md5加密(随机值_当时时间戳)
filesize:  //文件大小
file:       //文件内容(须含文件名)
返回值:
{"success":true,"uploadName":"tmp.xml","uploadPath":"uploads\/201311\/758e875fb7c7a508feef6b5036119b9f"}

由于工作中主要用python,并且项目中已有使用requests库的地方,所以计划使用requests来实现,本来以为是很简单的一个小功能,结果花费了大量的时间,requests官方的例子只提到了上传文件,并不需要传额外的参数:

https://2.python-requests.org//en/master/user/quickstart/#post-a-multipart-encoded-file


>>> url = 'https://httpbin.org/post'
>>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}

>>> r = requests.post(url, files=files)
>>> r.text
{
...
"files": {
 "file": "<censored...binary...data>"
},
...
}

但是如果涉及到了参数的传递时,其实就要用到requests的两个参数:data、files,将要上传的文件传入files,将其他参数传入data,request库会将两者合并到一起做一个multi part,然后发送给服务器。

最终实现的代码是这样的:


with open(file_name) as f:
content = f.read()
request_data = {
 'md5':md5.md5('%d_%d' % (0, int(time.time()))).hexdigest(),
 'filesize':len(content),
}
files = {'file':(file_name, open(file_name, 'rb'))}
MyLogger().getlogger().info('url:%s' % (request_url))
resp = requests.post(request_url, data=request_data, files=files)

虽然最终代码可能看起来很简单,但是其实我费了好大功夫才确认这样是OK的,中间还翻了requests的源码,下面记录一下翻阅源码的过程:

首先,找到post方法的实现,在requests.api.py中:


def post(url, data=None, json=None, **kwargs):
 r"""Sends a POST request.

:param url: URL for the new :class:`Request` object.
 :param data: (optional) Dictionary, list of tuples, bytes, or file-like
   object to send in the body of the :class:`Request`.
 :param json: (optional) json data to send in the body of the :class:`Request`.
 :param \*\*kwargs: Optional arguments that ``request`` takes.
 :return: :class:`Response <Response>` object
 :rtype: requests.Response
 """

return request('post', url, data=data, json=json, **kwargs)

这里可以看到它调用了request方法,咱们继续跟进request方法,在requests.api.py中:


def request(method, url, **kwargs):
 """Constructs and sends a :class:`Request <Request>`.

:param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.
 :param url: URL for the new :class:`Request` object.
 :param params: (optional) Dictionary, list of tuples or bytes to send
   in the query string for the :class:`Request`.
 :param data: (optional) Dictionary, list of tuples, bytes, or file-like
   object to send in the body of the :class:`Request`.
 :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
 :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
 :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
 :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
   ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
   or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
   defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
   to add for the file.
 :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
 :param timeout: (optional) How many seconds to wait for the server to send data
   before giving up, as a float, or a :ref:`(connect timeout, read
   timeout) <timeouts>` tuple.
 :type timeout: float or tuple
 :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
 :type allow_redirects: bool
 :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
 :param verify: (optional) Either a boolean, in which case it controls whether we verify
     the server's TLS certificate, or a string, in which case it must be a path
     to a CA bundle to use. Defaults to ``True``.
 :param stream: (optional) if ``False``, the response content will be immediately downloaded.
 :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
 :return: :class:`Response <Response>` object
 :rtype: requests.Response

Usage::

>>> import requests
  >>> req = requests.request('GET', 'https://httpbin.org/get')
  <Response [200]>
 """

# By using the 'with' statement we are sure the session is closed, thus we
 # avoid leaving sockets open which can trigger a ResourceWarning in some
 # cases, and look like a memory leak in others.
 with sessions.Session() as session:
   return session.request(method=method, url=url, **kwargs)

这个方法的注释比较多,从注释里其实已经可以看到files参数使用传送文件,但是还是无法知道当需要同时传递参数和文件时该如何处理,继续跟进session.request方法,在requests.session.py中:


def request(self, method, url,
     params=None, data=None, headers=None, cookies=None, files=None,
     auth=None, timeout=None, allow_redirects=True, proxies=None,
     hooks=None, stream=None, verify=None, cert=None, json=None):
   """Constructs a :class:`Request <Request>`, prepares it and sends it.
   Returns :class:`Response <Response>` object.

:param method: method for the new :class:`Request` object.
   :param url: URL for the new :class:`Request` object.
   :param params: (optional) Dictionary or bytes to be sent in the query
     string for the :class:`Request`.
   :param data: (optional) Dictionary, list of tuples, bytes, or file-like
     object to send in the body of the :class:`Request`.
   :param json: (optional) json to send in the body of the
     :class:`Request`.
   :param headers: (optional) Dictionary of HTTP Headers to send with the
     :class:`Request`.
   :param cookies: (optional) Dict or CookieJar object to send with the
     :class:`Request`.
   :param files: (optional) Dictionary of ``'filename': file-like-objects``
     for multipart encoding upload.
   :param auth: (optional) Auth tuple or callable to enable
     Basic/Digest/Custom HTTP Auth.
   :param timeout: (optional) How long to wait for the server to send
     data before giving up, as a float, or a :ref:`(connect timeout,
     read timeout) <timeouts>` tuple.
   :type timeout: float or tuple
   :param allow_redirects: (optional) Set to True by default.
   :type allow_redirects: bool
   :param proxies: (optional) Dictionary mapping protocol or protocol and
     hostname to the URL of the proxy.
   :param stream: (optional) whether to immediately download the response
     content. Defaults to ``False``.
   :param verify: (optional) Either a boolean, in which case it controls whether we verify
     the server's TLS certificate, or a string, in which case it must be a path
     to a CA bundle to use. Defaults to ``True``.
   :param cert: (optional) if String, path to ssl client cert file (.pem).
     If Tuple, ('cert', 'key') pair.
   :rtype: requests.Response
   """
   # Create the Request.
   req = Request(
     method=method.upper(),
     url=url,
     headers=headers,
     files=files,
     data=data or {},
     json=json,
     params=params or {},
     auth=auth,
     cookies=cookies,
     hooks=hooks,
   )
   prep = self.prepare_request(req)

proxies = proxies or {}

settings = self.merge_environment_settings(
     prep.url, proxies, stream, verify, cert
   )

# Send the request.
   send_kwargs = {
     'timeout': timeout,
     'allow_redirects': allow_redirects,
   }
   send_kwargs.update(settings)
   resp = self.send(prep, **send_kwargs)

return resp

先大概看一下这个方法,先是准备request,最后一步是调用send,推测应该是发送请求了,所以我们需要跟进到prepare_request方法中,在requests.session.py中:


def prepare_request(self, request):
   """Constructs a :class:`PreparedRequest <PreparedRequest>` for
   transmission and returns it. The :class:`PreparedRequest` has settings
   merged from the :class:`Request <Request>` instance and those of the
   :class:`Session`.

:param request: :class:`Request` instance to prepare with this
     session's settings.
   :rtype: requests.PreparedRequest
   """
   cookies = request.cookies or {}

# Bootstrap CookieJar.
   if not isinstance(cookies, cookielib.CookieJar):
     cookies = cookiejar_from_dict(cookies)

# Merge with session cookies
   merged_cookies = merge_cookies(
     merge_cookies(RequestsCookieJar(), self.cookies), cookies)

# Set environment's basic authentication if not explicitly set.
   auth = request.auth
   if self.trust_env and not auth and not self.auth:
     auth = get_netrc_auth(request.url)

p = PreparedRequest()
   p.prepare(
     method=request.method.upper(),
     url=request.url,
     files=request.files,
     data=request.data,
     json=request.json,
     headers=merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict),
     params=merge_setting(request.params, self.params),
     auth=merge_setting(auth, self.auth),
     cookies=merged_cookies,
     hooks=merge_hooks(request.hooks, self.hooks),
   )
   return p

在prepare_request中,生成了一个PreparedRequest对象,并调用其prepare方法,跟进到prepare方法中,在requests.models.py中:


def prepare(self,
     method=None, url=None, headers=None, files=None, data=None,
     params=None, auth=None, cookies=None, hooks=None, json=None):
   """Prepares the entire request with the given parameters."""

self.prepare_method(method)
   self.prepare_url(url, params)
   self.prepare_headers(headers)
   self.prepare_cookies(cookies)
   self.prepare_body(data, files, json)
   self.prepare_auth(auth, url)

# Note that prepare_auth must be last to enable authentication schemes
   # such as OAuth to work on a fully prepared request.

# This MUST go after prepare_auth. Authenticators could add a hook
   self.prepare_hooks(hooks)

这里调用许多prepare_xx方法,这里我们只关心处理了data、files、json的方法,跟进到prepare_body中,在requests.models.py中:


def prepare_body(self, data, files, json=None):
   """Prepares the given HTTP body data."""

# Check if file, fo, generator, iterator.
   # If not, run through normal process.

# Nottin' on you.
   body = None
   content_type = None

if not data and json is not None:
     # urllib3 requires a bytes-like body. Python 2's json.dumps
     # provides this natively, but Python 3 gives a Unicode string.
     content_type = 'application/json'
     body = complexjson.dumps(json)
     if not isinstance(body, bytes):
       body = body.encode('utf-8')

is_stream = all([
     hasattr(data, '__iter__'),
     not isinstance(data, (basestring, list, tuple, Mapping))
   ])

try:
     length = super_len(data)
   except (TypeError, AttributeError, UnsupportedOperation):
     length = None

if is_stream:
     body = data

if getattr(body, 'tell', None) is not None:
       # Record the current file position before reading.
       # This will allow us to rewind a file in the event
       # of a redirect.
       try:
         self._body_position = body.tell()
       except (IOError, OSError):
         # This differentiates from None, allowing us to catch
         # a failed `tell()` later when trying to rewind the body
         self._body_position = object()

if files:
       raise NotImplementedError('Streamed bodies and files are mutually exclusive.')

if length:
       self.headers['Content-Length'] = builtin_str(length)
     else:
       self.headers['Transfer-Encoding'] = 'chunked'
   else:
     # Multi-part file uploads.
     if files:
       (body, content_type) = self._encode_files(files, data)
     else:
       if data:
         body = self._encode_params(data)
         if isinstance(data, basestring) or hasattr(data, 'read'):
           content_type = None
         else:
           content_type = 'application/x-www-form-urlencoded'

self.prepare_content_length(body)

# Add content-type if it wasn't explicitly provided.
     if content_type and ('content-type' not in self.headers):
       self.headers['Content-Type'] = content_type

self.body = body

这个函数比较长,需要重点关注L52,这里调用了_encode_files方法,我们跟进这个方法:


def _encode_files(files, data):
   """Build the body for a multipart/form-data request.

Will successfully encode files when passed as a dict or a list of
   tuples. Order is retained if data is a list of tuples but arbitrary
   if parameters are supplied as a dict.
   The tuples may be 2-tuples (filename, fileobj), 3-tuples (filename, fileobj, contentype)
   or 4-tuples (filename, fileobj, contentype, custom_headers).
   """
   if (not files):
     raise ValueError("Files must be provided.")
   elif isinstance(data, basestring):
     raise ValueError("Data must not be a string.")

new_fields = []
   fields = to_key_val_list(data or {})
   files = to_key_val_list(files or {})

for field, val in fields:
     if isinstance(val, basestring) or not hasattr(val, '__iter__'):
       val = [val]
     for v in val:
       if v is not None:
         # Don't call str() on bytestrings: in Py3 it all goes wrong.
         if not isinstance(v, bytes):
           v = str(v)

new_fields.append(
           (field.decode('utf-8') if isinstance(field, bytes) else field,
            v.encode('utf-8') if isinstance(v, str) else v))

for (k, v) in files:
     # support for explicit filename
     ft = None
     fh = None
     if isinstance(v, (tuple, list)):
       if len(v) == 2:
         fn, fp = v
       elif len(v) == 3:
         fn, fp, ft = v
       else:
         fn, fp, ft, fh = v
     else:
       fn = guess_filename(v) or k
       fp = v

if isinstance(fp, (str, bytes, bytearray)):
       fdata = fp
     elif hasattr(fp, 'read'):
       fdata = fp.read()
     elif fp is None:
       continue
     else:
       fdata = fp

rf = RequestField(name=k, data=fdata, filename=fn, headers=fh)
     rf.make_multipart(content_type=ft)
     new_fields.append(rf)

body, content_type = encode_multipart_formdata(new_fields)

return body, content_type

OK,到此为止,仔细阅读完这个段代码,就可以搞明白requests.post方法传入的data、files两个参数的作用了,其实requests在这里把它俩合并在一起了,作为post的body。

来源:https://www.cnblogs.com/lit10050528/p/11285600.html

标签:Python,requests,文件
0
投稿

猜你喜欢

  • JS中如何优雅的使用async await详解

    2024-05-02 16:19:23
  • Oracle轻松取得建表和索引的DDL语句

    2009-02-26 10:26:00
  • FastApi+Vue+LayUI实现前后端分离的示例代码

    2024-04-30 10:22:48
  • 记一次MySQL的优化案例

    2024-01-14 21:32:58
  • 解决python明明pip安装成功却找不到包的问题

    2021-05-21 14:50:40
  • asp如何在第10000名来访者访问时显示中奖页面?

    2010-06-18 19:45:00
  • python 中的9个实用技巧,助你提高开发效率

    2021-05-01 08:26:25
  • Python 自动化表单提交实例代码

    2022-12-20 06:16:14
  • 如何使用Git实现切换分支开发过程解析

    2022-07-03 20:57:06
  • ASP中实现分页显示的七种方法

    2007-09-20 13:19:00
  • 判断浏览器是否接受 Cookie

    2009-07-28 17:52:00
  • python matplotlib画图实例代码分享

    2022-06-12 23:12:21
  • Highcharts+NodeJS搭建数据可视化平台示例

    2024-05-02 17:38:38
  • Python OpenCV对图像像素进行操作

    2021-02-25 13:02:20
  • 使用Python中的reduce()函数求积的实例

    2021-08-14 04:35:47
  • Python装饰器使用你可能不知道的几种姿势

    2023-01-23 23:56:09
  • 引起用户注意的界面方式

    2007-10-07 21:17:00
  • js实现根据文件url批量压缩下载成zip包

    2024-04-22 22:15:17
  • Pandas中把dataframe转成array的方法

    2023-07-05 11:44:34
  • python是否适合网页编程详解

    2021-04-19 11:35:47
  • asp之家 网络编程 m.aspxhome.com