October 29, 2014

Doing the OAuth2 Dance in 57 Lines of Python

I’m currently involved in the development of an application that intends to use multiple third-party storage providers as a file storage backend. One of those providers is Box. Box makes it easy to develop applications that can access the files in a user’s Box account via their API, but obtaining an API key requires that the user login via OAuth. Using Box as the backend for an application necesitates that the application be able to authenticate with Box anytime it chooses; furthermore, a Box account used as a backend isn’t owned by a person, it belongs to the application, so no one is available to do the OAuth dance in the first place.

As a learning endeavour, I set out to write a Python script capable of automating the OAuth2 authentication process. Eventually, it became clear that by retrieving just one access token from Box, I could save the associated refresh token and use that to continually renew my authenticated session with Box every 45 minutes (the access token is good for 1 hour). This prevents the need to use the script below in the application itself; doing so would actually be a security risk (login credentials are hard-coded in the script) and an application availability risk, since a single change to the login process would prevent retrieval of a new access token.

The outstanding lxml.html Python module makes it easy to fill HTML forms and submit them; by making use of an open HTTP session for the OAuth requests (persists cookies) and lxml.html functions, I can run the brief program below to make any API request I wish for any Box account, provided I know the username and password for that account:

import os
import sys
import json
import urllib
import urllib2
import hashlib
import requests
from urlparse import urlparse, parse_qs
from lxml.html import fromstring, tostring, submit_form

def make_open_http():
     opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
     opener.addheaders = []
     def open_http(method, url, values={}):
          return opener.open(url, urllib.urlencode(values))
     return open_http

CLIENT_ID = '<your_box_app_client_id_here>'
CSRF_RAND_TOKEN = hashlib.sha1(os.urandom(128)).hexdigest()
REDIRECT_URI = '<your_domain_here>'

open_http = make_open_http()

leg1_url = "https://app.box.com/api/oauth2/authorize?response_type=code&client_id=" + CLIENT_ID + "&state=" + CSRF_RAND_TOKEN + "&redirect_uri=" + REDIRECT_URI

login_page = fromstring(open_http("GET", leg1_url).read())
login_form = login_page.forms[0]
login_form.fields['login'] = '<your_box_login_here>'
login_form.fields['password'] ='<your_box_pswd_here>'

grant_resp = submit_form(login_form, open_http=open_http)
grant_page = fromstring(grant_resp.read())
grant_form = grant_page.forms[0]

leg1_resp = submit_form(grant_form, open_http=open_http)
leg1 = parse_qs(urlparse(leg1_resp.geturl()).query)

# TODO: Throw exception rather than exiting if CSRF token differs
if leg1['state'][0] != CSRF_RAND_TOKEN:
  print "CSRF token doesnt match; exiting."

GRANT_CODE = leg1['code'][0]
CLIENT_SECRET = '<your_box_app_client_secret_here>'

leg2_payload = {'grant_type': 'authorization_code', 'code': GRANT_CODE, 'client_id': CLIENT_ID, 'client_secret': CLIENT_SECRET, 'redirect_url': REDIRECT_URI}
leg2_resp = requests.post("https://app.box.com/api/oauth2/token", data=leg2_payload)
leg2_json = leg2_resp.json()

# TODO: During application development, make use of the refresh_token
ACCESS_TOKEN = leg2_json['access_token']
REFRESH_TOKEN = leg2_json['refresh_token']

BOX_AUTH_HEADERS = {'Authorization': 'Bearer ' + ACCESS_TOKEN}
folder_resp = requests.get("https://www.box.com/api/2.0/folders/0", headers=BOX_AUTH_HEADERS)
folder_json = folder_resp.json()
print json.dumps(folder_json, indent=4)

Note that the CLIENT_ID and CLIENT_SECRET values are obtained by registering your application with Box, as discussed in the Box API documentation.

Useful References

I made use of the following helpful resources: