Python Httplib2

Learn to work with the Python httplib2 module. The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web.

Python httplib2 module provides methods for accessing Web resources via HTTP. It supports many features, such as HTTP and HTTPS, authentication, caching, redirects, and compression.

$ service nginx status
 * nginx is running

We run nginx web server on localhost. Some of our examples will connect to PHP scripts on a locally running nginx server.

Table of Contents

Check httplib2 Library Version
Use httplib2 to Read Web Page
Send HTTP HEAD Request
Send HTTP GET Request
Send HTTP POST Request
Send User Agent Information
Add Username/Password to Request

Check httplib2 Library Version

The first program prints the version of the library, its copyright, and the documentation string.

#!/usr/bin/python3

import httplib2

print(httplib2.__version__)
print(httplib2.__copyright__)
print(httplib2.__doc__)

The httplib2.__version__ gives the version of the httplib2 library, the httplib2.__copyright__ gives its copyright, and the httplib2.__doc__ its documentation string.

$ ./version.py 
0.8
Copyright 2006, Joe Gregorio

httplib2

A caching http interface that supports ETags and gzip
to conserve bandwidth.

Requires Python 3.0 or later

Changelog:
2009-05-28, Pilgrim: ported to Python 3
2007-08-18, Rick: Modified so it's able to use a socks proxy if needed.

This is a sample output of the example.

Use httplib2 to Read Web Page

In the following example we show how to grab HTML content from a website called www.something.com.

#!/usr/bin/python3

import httplib2

http = httplib2.Http()
content = http.request("http://www.something.com")[1]

print(content.decode())

An HTTP client is created with httplib2.HTTP(). A new HTTP request is created with the request() method; by default, it is a GET request. The return value is a tuple of response and content.

$ ./get_content.py 
<html><head><title>Something.</title></head>
<body>Something.</body>
</html>

This is the output of the example.

Stripping HTML tags

The following program gets a small web page and strips its HTML tags.

#!/usr/bin/python3

import httplib2
import re

http = httplib2.Http()
content = http.request("http://www.something.com")[1]

stripped = re.sub('<[^<]+?>', '', content.decode())
print(stripped)

A simple regular expression is used to strip the HTML tags. Note that we are stripping data, we do not sanitize them. (These are two different things.)

$ ./strip_tags.py 
Something.
Something.

The script prints the web page’s title and content.

Check Response Status

The response object contains a status property which gives the status code of the response.

#!/usr/bin/python3

import httplib2

http = httplib2.Http()

resp = http.request("http://www.something.com")[0]
print(resp.status)

resp = http.request("http://www.something.com/news/")[0]
print(resp.status)

We perform two HTTP requests with the request() method and check for the returned status.

$ ./get_status.py 
200
404

200 is a standard response for successful HTTP requests and 404 tells that the requested resource could not be found.

Send HTTP HEAD Request

The HTTP HEAD method retrieves document headers. The header consists of fields, including date, server, content type, or last modification time.

#!/usr/bin/python3

import httplib2

http = httplib2.Http()

resp = http.request("http://www.something.com", "HEAD")[0]

print("Server: " + resp['server'])
print("Last modified: " + resp['last-modified'])
print("Content type: " + resp['content-type'])
print("Content length: " + resp['content-length'])

The example prints the server, last modification time, content type, and content length of the www.something.com web page.

$ ./do_head.py 
Server: Apache/2.4.12 (FreeBSD) OpenSSL/1.0.1l-freebsd mod_fastcgi/mod_fastcgi-SNAP-0910052141
Last modified: Mon, 25 Oct 1999 15:36:02 GMT
Content type: text/html
Content length: 72

This is the output of the program. From the output we can see that the web page is delivered by Apache web server, which is hosted by FreeBSD. The document was last modified in 1999. The web page is an HTML document whose length is 72 bytes.

Send HTTP GET Request

The HTTP GET method requests a representation of the specified resource. For this example, we are also going to use the greet.php script:

<?php

echo "Hello " . htmlspecialchars($_GET['name']);

?>

Inside the /usr/share/nginx/html/ directory, we have this greet.php file. The script returns the value of the name variable, which was retrieved from the client.

The htmlspecialchars() function converts special characters to HTML entities; e.g. & to &amp.

#!/usr/bin/python3

import httplib2

http = httplib2.Http()
content = http.request("http://localhost/greet.php?name=Peter", 
                       method="GET")[1]

print(content.decode())

The script sends a variable with a value to the PHP script on the server. The variable is specified directly in the URL.

$ ./mget.py 
Hello Peter

This is the output of the example.

$ tail -1 /var/log/nginx/access.log
127.0.0.1 - - [21/Aug/2016:17:32:31 +0200] "GET /greet.php?name=Peter HTTP/1.1" 200 42 "-" 
"Python-httplib2/0.8 (gzip)"

We examine the nginx access log.

Send HTTP POST Request

The POST request method requests that a web server accept and store the data enclosed in the body of the request message. It is often used when uploading a file or submitting a completed web form.

<?php

echo "Hello " . htmlspecialchars($_POST['name']);

?>

On our local web server, we have this target.php file. It simply prints the posted value back to the client.

#!/usr/bin/python3

import httplib2
import urllib

http = httplib2.Http()

body = {'name': 'Peter'}

content = http.request("http://localhost/target.php", 
                       method="POST", 
                       headers={'Content-type': 'application/x-www-form-urlencoded'},
                       body=urllib.parse.urlencode(body) )[1]

print(content.decode())

The script sends a request with a name key having Peter value. The data is encoded with the urllib.parse.urlencode() method and sent in the body of the request.

$ ./mpost.py 
Hello Peter

This is the output of the mpost.py script.

$ tail -1 /var/log/nginx/access.log
127.0.0.1 - - [23/Aug/2016:12:21:07 +0200] "POST /target.php HTTP/1.1" 
    200 37 "-" "Python-httplib2/0.8 (gzip)"

With the POST method, the value is not send in the request URL.

Send User Agent Information

In this section, we specify the name of the user agent.

<?php 

echo $_SERVER['HTTP_USER_AGENT'];

?>

Inside the nginx document root, we have the agent.php file. It returns the name of the user agent.

#!/usr/bin/python3

import httplib2

http = httplib2.Http()
content = http.request("http://localhost/agent.php", method="GET", 
                  headers={'user-agent': 'Python script'})[1]

print(content.decode())

This script creates a simple GET request to the agent.php script. In the headers dictionary, we specify the user agent. This is read by the PHP script and returned to the client.

$ ./user_agent.py 
Python script

The server responded with the name of the agent that we have sent with the request.

Add Username/Password to Request

The client’s add_credentials() method sets the name and password to be used for a realm. A security realm is a mechanism used for protecting web application resources.

$ sudo apt-get install apache2-utils
$ sudo htpasswd -c /etc/nginx/.htpasswd user7
New password: 
Re-type new password: 
Adding password for user user7

We use the htpasswd tool to create a user name and a password for basic HTTP authentication.

location /secure {

        auth_basic "Restricted Area";
        auth_basic_user_file /etc/nginx/.htpasswd;
}

Inside the nginx /etc/nginx/sites-available/default configuration file, we create a secured page. The name of the realm is “Restricted Area”.

<!DOCTYPE html>
<html lang="en">
<head>
<title>Secure page</title>
</head>

<body>

<p>
This is a secure page.
</p>

</body>

</html>

Inside the /usr/share/nginx/html/secure directory, we have the above HTML file.

#!/usr/bin/python3

import httplib2

user = 'user7'
passwd = '7user'

http = httplib2.Http()
http.add_credentials(user, passwd)
content = http.request("http://localhost/secure/")[1]

print(content.decode())

The script connects to the secure webpage; it provides the user name and the password necessary to access the page.

$ ./credentials.py 
<!DOCTYPE html>
<html lang="en">
<head>
<title>Secure page</title>
</head>

<body>

<p>
This is a secure page.
</p>
</body>

</html>

With the right credentials, the script returns the secured page.

In this tutorial, we have explored the Python httplib2 module.

The tutorial was written by Jan Bodnar who runs zetcode.com, which specializes in programming tutorials.

Was this post helpful?

Join 7000+ Fellow Programmers

Subscribe to get new post notifications, industry updates, best practices, and much more. Directly into your inbox, for free.

Leave a Comment

HowToDoInJava

A blog about Java and its related technologies, the best practices, algorithms, interview questions, scripting languages, and Python.