Study In Scarlett


A Study In Scarlet
Exploiting Common Vulnerabilities in PHP Applications

“A reprint of reminisces from the Blackhat Briefings Asia 2001”


— < Table of Contents > ————————————————–

1. Introduction
2. Caveats and Scope
3. Global Variables
4. Remote Files
5. File Upload
6. Library Files
7. Session Files
8. Loose Typing And Associative Arrays
9. Target Functions
10. Protecting PHP
11. Responsibility – Language vs Programmer
12. Other

“I could imagine his giving a friend a little pinch of the latest vegetable
alkaloid, not out of malevolence, you understand, but simply out of a spirit
of inquiry in order to have an accurate idea of the effects.” – Stamford

— < 1. Introduction > —————————————————-

This paper is based on my speech during the Blackhat briefings in Singapore and Hong Kong in April 2001. The speech was entitled “Breaking In Through the Front Door – The impact of Web Applications and Application Service Provision on Traditional Security Models”. It initially discussed the trend towards Web Applications (and ASP) and the holes in traditional security methodology exposed by this trend. However, that’s a long and boring discussion so I’ll save it for the policy makers.

The rest of the speech was spent talking about PHP. For those reading this
paper who don’t know what PHP is, PHP stands for “PHP Hypertext
Preprocessor”. It’s a programming language (designed specifically for the
Web) in which PHP code is embedded in web pages. When a client requests a
page, the Web Server first passes the page to the language interpreter so
the code can be executed, the resulting page is then returned to the client.

Obviously this approach is much more suited to the page by page nature of
web transactions than traditional CGI languages such as Perl and C. PHP (and
to some extent other Web Languages) has the following characteristics:
+ Interpreted
+ Fast Execution – The interpreter is embedded in the web server, no fork()
or setup overhead
+ Feature Rich – Hundreds of non trivial builtin functions
+ Simple Syntax – Non declared and loosely typed variables, ‘wordy’
function names

Over the course of this paper I’m going to try to explain why I feel the
last two characteristics make applications written in PHP easy to attack and
hard to defend. Then I’ll finish off with a rant about distribution of
‘blame’ when it comes to software security.

“You must study him, then … you’ll find him a knotty problem, though. I’ll
wager he learns more about you than you about him.” – Stamford

— < 2. Caveats and Scope > ———————————————–

Almost all the observations in this paper refer to a default install of PHP
4.0.4pl1 (with MySQL, PostgreSQL, IMAP and OpenSSL support enabled) running
as a module under Apache 1.3.19 on a Linux machine. This of course means
that your mileage may vary, in particular, there have been many many
versions of PHP and they sometimes exhibit vastly different behaviour given
the same input.

Also, proponents of PHP tend to defend the language based on its extreme
configurability. I feel very confident the vast majority of users will not
modify the default PHP configuration at all, lest some of the amazing array
of freely available PHP software stop working. Thus I don’t feel pressured
to defend my position based on configuration options, nonetheless I’ve
included a section about how to go defending PHP applications using these
configuration options.

Finally, some people deride this kind of work as ‘trivial’ or ‘obvious’,
particularly since I won’t be discussing any specific vulnerabilities in
particular pieces of PHP software. To prove the risks are real and that even
programmer’s that try hard fall into these traps 4 detailed advisories in
regards to specific pieces of vulnerable software will be released shortly
after this paper.

“I have to be careful … for I dabble with poisons a good deal.” – Sherlock

— < 3. Global Variables > ————————————————

As mentioned earlier, variables in PHP don’t have to be declared, they’re
automatically created the first time they are used. Nor are they
specifically typed, they’re typed automatically based on the context in
which they are used. This is an extremely convenient way to do things from a
programmer’s perspective (and is obviously a useful feature in a rapid
application development language). Once a variable is created it can be
referenced anywhere in the program (except in functions where it must be
explicitly included in the namespace with the ‘global’ function). The result
of these characteristics is that variables are rarely initialized by the
programmer, after all, when they’re first created they are empty (i.e “”).

Obviously the main function of a PHP based web application is usually to
take in some client input (form variables, uploaded files, cookies etc),
process the input and return output based on that input. In order to make it
as simple as possible for the PHP script to access this input, it’s actually
provided in the form of PHP global variables. Take the following example
HTML snippet:

Obviously this will display a text box and a submit button. When the user
presses the submit button the PHP script test.php will be run to process the
input. When it runs the variable $hello will contain the text the user
entered into the text box. It’s important to note the implications of this,
this means that a remote attacker can create any variable they wish and have
it declared in the global namespace. If instead of using the form above to
call test.php, an attacker calls it directly with a url like
“http://server/test.php?hello=hi&setup=no”, not only will $hello = “hi” when
the script is run but $setup will be “no” also.

An example of how this can be a real problem might be a script that was
designed to authenticate a user before displaying some important
information. For example:

In normal operation the above code will check the password to decide if the
remote user has successfully authenticated then later check if they are
authenticated and show them the important information. The problem is that
the code incorrectly assumes that the variable $auth will be empty unless it
sets it. Remembering that an attacker can create variables in the global
namespace, a url like ‘http://server/test.php?auth=1’ will fail the password
check but the script will still believe the attacker has successfully

To summarize the above, a PHP script _cannot trust ANY variable it has not
EXPLICITLY set_. When you’ve got a rather large number of variables, this
can be a much harder task than it may sound.

Once common approach to protecting a script is to check that the variable is
not in the array HTTP_GET/POST_VARS[] (depending on the method normally used
to submit the form, GET or POST). When PHP is configured with track_vars
enabled (as it is by default) variables submitted by the user are available
both from the global variables and also as elements in the arrays mentioned
above. However, it’s important to note that there are FOUR different arrays
for remote user input, HTTP_GET_VARS for variables submitted in the URL of
the get request, HTTP_POST_VARS for variables submitted in the post section
of a HTTP request, HTTP_COOKIE_VARS for variables submitted as part of the
cookie headers in the HTTP request and to a limited degree the
HTTP_POST_FILES array (in more recent versions of PHP). It is completely the
end users choice which method they use to submit variables, one request can
easily place variables in all four different arrays, a secure script needs
to check all four (though again, the HTTP_POST_FILES array shouldn’t be an
issue except in exceptional circumstances).

“No man burdens his mind with small matters unless he has some very good
reason for doing so.” – John Watson

— < 4. Remote Files > —————————————————-

I’m going to repeat this frequently during this document but it bears
repeating, PHP is an extremely feature rich language. It ships with an
amazing amount of functionality out of the box and tries hard to make life
as easy as possible for the coder (or web designer as the case so often is).
From a security perspective, the more superfluous functionality offered by a
language and the less intuitive the possibilities, the more difficult it is
to secure applications written in it. An excellent example of this is the
Remote Files functionality of PHP.

The following piece of PHP code is designed to open a file:


The code attempts to open the file specified in the variable $filename for
reading and if it fails displays an error. Obviously this could be a simple
security issue if the user can set $filename and get the script to expose
/etc/passwd for example but one non intuitive this code could end up doing
is reading data from another web/ftp site. The remote files functionality
means that the majority of PHPs file handling functions can work
transparently on remote files via HTTP and FTP. If $filename were to contain
(for example)
“http://target/scripts/..%c1%1c../winnt/system32/cmd.exe?/c+dir” PHP will
actually make a HTTP request to the server “target”, in this case trying to
exploit the unicode flaw.
This gets more interesting in the context of four other file functions that
support remote file functionality (*** except under Windows ***), include(),
require(), include_once() and require_once(). These functions take in a
filename and read that file and parse it as PHP code. They’re typically used
to support the concept of code libraries, where common bits of PHP code are
stored in files and included as needed. Now take the following piece of

Presumably $libdir is a configuration variable that is meant to be set
earlier in script execution to the directory where the library files are
stored. If the attacker can cause the variable not to be set the script
(which is typically not a tremendously difficult task) and instead submit it
themselves they can modify the start of the path. This would normally gain
them nothing since they still end up only being able to access languages.php
in a directory of their choosing (poison null attacks like those possible on
Perl don’t work under PHP) but with remote files the attack can submit any
code they wish to be executed. For example, if the attacker places a file on
a web server called languages.php containing the following:

then sets $libdir to “http:///” upon encountering the include
statement PHP will make a HTTP request to evilhost, retrieve the attackers
code and execute it, returning a listing of /etc to the attackers web
browser. Note that the attacking webserver (evilhost) can’t be running PHP
or the code will be run on the attacking machine rather than the target
machine (see the “Other” section and its reference to SRADV00006 for an
example of code which survives being on a PHP enabled attacking machine).

“There are no crimes and no criminals in these days” – Sherlock Holmes

— < 5. File Upload > —————————————————–

As if PHP hadn’t already provided enough to make life easier for the
attacker the language provides automatic support for RFC 1867 based file
upload. Take the following form:

This form will allow the web browser user to select a file from their local
machine then when they click submit the file will be uploaded to the remote
web server. This is obviously useful functionality but is PHPs response that
makes this dangerous. When PHP first receives the request, before it has
even BEGUN to parse the PHP script being called it will automatically
receive the file from the remote user, it will then check that the file is
no larger than specified in the $MAX_FILE_SIZE variable (10 kb in this case)
and the maximum file size set in the PHP configuration file, if it passes
these tests the file is SAVED on the local disk in a temporary directory.
Please read that again if that doesn’t make you blink, a remote user can
send any file they wish to a PHP enabled machine and before a script has
even specified whether or not it accepts file uploads that file is SAVED on
the local disk.

I’m going to ignore any resource exhaustion attacks that may or may not be
possible using file upload functionality, I think they’re fairly limited if
not impossible in any case.

First let’s consider a script that IS designed to receive file uploads. As
described above the file is received and saved on the local disk (in the
location specified in the configuration for uploaded files, typically /tmp)
with a random filename (e.g “phpxXuoXG”). The PHP script then needs
information regarding the uploaded file to be able to process it. This is
actually provided in two different ways, one has been in use since early
versions of PHP 3, the other was introduced following our Advisory regarding
the issue I’m about to describe with the former method. Suffice to say the
problem is still alive and well, most scripts continue to use the old
method. PHP sets four global variables to describe the uploaded file, for
example (given the upload form above):

$hello = Filename on local machine (e.g “/tmp/phpxXuoXG”)
$hello_size = Size in bytes of file (e.g 1024)
$hello_name = The original name of the file on the remote system (e.g
$hello_type = Mime type of uploaded file (e.g “text/plain”)

The PHP script then proceeds to work on the file as located via the $hello
variable. The problem is that it isn’t immediately obvious that $hello need
not really be a PHP set variable and can simply be set by a remote attacker.
Take the following form input for example:


That results in the following global PHP variables (of course POST could be
used (even cookies)):

$hello = “/etc/passwd”
$hello_size = 10240
$hello_type = “text/plain”
$hello_name = “hello.txt”

This form input will provide exactly the variables the PHP scripts expects
to be set by PHP, but instead of working on an uploaded file the script will
infact be working on /etc/passwd (usually resulting in its content being
exposed). This attack can be used to expose the contents of all sorts of
sensitive files (in particular configuration files containing database and
other third tier server credentials).

I noted above that newer versions of PHP provide different methods for
determining the uploaded files (it’s done via the HTTP_POST_FILES[] array
mentioned earlier). It also provides numerous functions to avoid this
problem, for example a function to determine if a particular file is
actually one that has been uploaded. These methods well and truly fix the
problem but there is certainly no shortage of scripts out there still using
the old method and still vulnerable to this sort of attack.

As an alternate attack assisted by file upload consider the following
example PHP code:

If the attacker can control $theme they can obviously use this to read any
file on the remote system (except that content inside PHP tags e.g ” —————————————————

I’ve mentioned the include() and require() functions earlier, I also said
that they’re generally used to support the concept of code libraries. What I
mean by that is that common bits of code are put into a separate file and
when needed in the application simply include()ed from the file. include()
and require() will take any specified filename and read the file and parse
its contents as PHP code.

Initially when people started developing and distributing PHP applications
they chose to distinguish library and main application code by giving
library files the ‘.inc’ extension. However they quickly found this was a
bad move in general since such files aren’t normally parsed as PHP code by
the PHP interpreter. If requested from the web server they will generally
have the full source code returned. This is because the PHP interpreter
(when used as an apache module) determines which files to parse for PHP code
based on the file’s extension, the extensions to be interpreted can be
chosen by the administrator but usually a combination of the extensions
‘.php’, ‘.php4’ and ‘.php3’ is chosen. This is a real problem when sensitive
configuration data (e.g database credentials) is placed in PHP files that
don’t have an appropriate extension since a remote attacker can easily get
the source.

The simplest solution (and the one that has since become favored) is simply
to give EVERY file a PHP parsed extension. This prevents a request to the
web server ever returning the raw source for a file that contains PHP code.
The problem here is that though the source will no longer be returned, by
requesting the file a remote attacker can have the code that is meant to be
used in a framework of other code executed out of context. This can lead to
all of the attacks I’ve described earlier.

An obvious example might be the following:

In main.php:

In libdir/loadlanguage.php:

When libdir/loadlanguage.php is called in the defined context of main.php it
is perfectly safe. But because libdir/loadlanguage has the extension .php
(it doesn’t have to have that extension, include() works on any file) it can
be requested and executed by a remote attacker. When out of context an
attacker can set $langDir and $userLang to whatever they wish.

“You know a conjuror gets no credit when once he has explained his trick and
if I show you too much of my method of working, you will come to the
conclusion that I am a very ordinary individual after all” – Sherlock Holmes

— < 7. Session Files > —————————————————

Later versions of PHP (4 and above) provide built-in support for ‘sessions’.
Their basic purpose is to be able to save state information from page to
page in a PHP application. For example, when a user logs in to a web site,
the fact that they are logged in (and who they are logged in) could be saved
in the session. When they move around the site this information will be
available to all other PHP pages. What actually happens is that when a
session is started (it’s typically set in the configuration file to be
automatically started on first request) a random session id is generated,
the session persists as long as the remote browser always submits this
session id with requests. This is most easily achieved with a cookie but can
also be done by achieved by putting a form variable (containing the session
id) on every page. The session is a variable store, a PHP application can
choose to register a particular variable with the session, its value is then
stored in a session file at the end of every PHP script and loaded into the
variable at the start of every script. A trivial example is as follows:

Any later PHP scripts will automatically have the variable $session_auth set
to “shaun”, if they modify it later scripts will receive the modified value.
This is obviously a very handy facility to have in a stateless environment
like the web but caution is also necessary.

One obvious problem is with insuring that variables actually come from the
session. For example, given the above code, if a later script does the

This code makes the assumption that if $session_auth is set, it must have
come from the session and not from remote input. If an attacker specified
$session_auth in form input they can gain access to the site. Note that the
attacker must use this attack before the variable is registered with the
session, once a variable is in a session it will override any form input.

Session data is saved in a file (in a configurable location, usually /tmp)
named ‘sess_’. This file contains the names of the variables in
the session, their loose type, value and other data. On multi host systems
this can be an issue since the files are saved as the user running the web
server (typically nobody), a malicious site owner can easily create a
session file granting themselves access on another site or even examine the
session files looking for sensitive information.

The session mechanism also supplies another convenient place that an
attacker have their input saved into a file on the remote machine. For
examples above where the attacker needed PHP code in a file on the remote
machine, if they cannot use file upload they can often use the application
and have a session variable set to a value of their choosing. They can then
guess the location of the session file, they know the filename ‘php’ they just have to guess the directory, usually /tmp.

Finally an issue I haven’t found a use for is that an attacker can specify
any session id they wish (e.g ‘hello’) and have a session file created with
that id (for the example ‘/tmp/sess_hello’). The id can only contain
alphanumeric characters but this might well be useful in some situations.

“It is a mistake to confound strangeness with mystery” – Sherlock Holmes

— < 8. Loose Typing And Associative Arrays > —————————–

Just a quick note about these factors.

PHP is a loosely typed language, that is, a variable has different values
depending on the context in which it is being evaluated. For example, the
variable $hello set to the empty string “” when evaluated as a number has
the value 0. This can sometimes lead to non intuitive results (a factor that
was important in the exploitation of phpMyAdmin in SRADV00008). If $hello is
set to “000” it is NOT equal to “0” nor will the function empty() return

PHP arrays are associative, that is, the index to the array is a STRING and
can be set to any string value, it is not numerically evaluated. This means
that the array entry $hello[“000”] is NOT the same as the array entry

Applications need to be careful to validate user input with thought to the
above factors and to do so consistently. I.e don’t test is something is
equal to 0 in one place and then validate it using empty() somewhere else.

“We want something more than mere preaching now” – Mr. Gregson

— < 9. Target Functions > ————————————————

When looking for holes in PHP applications (when you have the source code)
it’s useful to have a list of functions that are frequently misused or are
good targets if they happen to be used in a vulnerable manner in the target
application. If a remote user can affect the parameters to these functions
exploitation is often possible. The following is a non exhaustive breakdown.

PHP Code Execution:
require() and include() – Both these functions read a specified file and
interpret the contents as PHP code
eval() – Interprets a given string as PHP code
preg_replace() – When used with the /e modifier this function interprets the
replacement string as PHP code

Command Execution:
exec() – Executes a specified command and returns the last line of the
programs output
passthru() – Executes a specified command and returns all of the output
directly to the remote browser
“ (backticks) – Executes the specified command and returns all the output
in an array
system() – Much the same as passthru() but doesn’t handle binary data
popen() – Executes a specified command and connects its output or input
stream to a PHP file descriptor

File Disclosure:
fopen() – Opens a file and associates it with a PHP file descriptor
readfile() – Reads a file and writes its contents directly to the remote
file() – Reads an entire file into an array

“There is mystery about this which stimulates the imagination; where there
is no imagination there is no horror” – Sherlock Holmes

— < 10. Protecting PHP > ————————————————–

All of the attacks I’ve described above work perfectly on a default
installation of PHP 4. However as I’ve mentioned numerous times PHP is
endlessly configurable and many of these attacks can be defeated using those
configuration options. There is always a price for security though, so I’ve
classified the following configuration options according to their
* = Mostly painless
** = Vaguely painful
*** = Seriously hurts
**** = Chinese Water Torture

Obviously my ratings are subjective so don’t flame me for them. I will say
one thing though, if you use all of the options you’ll have a very secure
PHP installation, even third party code will be reliably secure, it’s just
that most of it won’t work 🙂

**** – Set register_globals off
This option will stop PHP creating global variables for user input. That is,
if a user submits the form variable ‘hello’ PHP won’t set $hello, only
HTTP_GET/POST_VARS[‘hello’]. This is the mother of all other options and is
best single option for PHP security, it will also kill basically every third
party application available and makes programming PHP a whole lot less

*** – Set safe_mode on
I’d love to describe exactly what safe_mode does but it isn’t documented
completely. It introduces a large variety of restrictions including:
– The ability to restrict which commands can be executed (by exec() etc)
– The ability to restrict which functions can be used
– Restricts file access based on ownership of script and target file
– Kills file upload completely
This is a great option for ISP environments (for which it is designed) but
it can also greatly improve the security of normal PHP environments given
proper configuration. It can also be a complete pain in the neck.

** – Set open_basedir
This option prevents any file operations on files outside specified
directories. This can effectively kill a variety of local include() and
remote file attacks. Caution is still required in regards to file upload and
session files.

** – Set display_errors off, log_errors on
This prevents PHP error messages being displayed in the returned web page.
This can effectively limit an attackers exploration of the function of the
script they are attacking. It can also make debugging very frustrating.

* – Set allow_url_fopen off
This stops remote files functionality. Very few sites really need this
functionality, I absolutely recommend every site set this option.

There may well be other great options I’m missing, please consult the PHP

“Our ideas must be as broad as nature if we are to interpret nature” –
Sherlock Holmes

— < 11. Responsibility – Language Vs Programmer > ————————-

I contend that it is very hard to write a secure PHP application (in the
default configuration of PHP), even if you try. It’s not that PHP is a bad
language, it’s amazingly easy to program in and has more builtin features
than any other language I know. However PHP has such emphasis on rapid
development and feature richness that two things happen:
– Web designers and other non coders end up writing PHP applications. They
have no understanding whatsoever of the security implications of the code
they are writing. Partly this is because the mindset isn’t what it should
be. A PHP application typically runs in the most exposed environment
possible, a universally accessible page on a web server. This means the
mindset should be of coding a network daemon that will be routinely
attacked, or of a setuid root application. Instead the mindset is
functionality at all costs like it would be while writing an unprivileged
local application. If your web server is penetrated it provides a gateway to
the third tier, it is always a bad thing, even if the access is as nobody
(as penetrating a PHP application will typically provide).
– Code behaviour becomes unpredictable. An include() statement that
postfixes a user variable with “image.php” would normally be perfectly safe,
the user can only specify which directory to retrieve that file from (and
presumably cannot create a file image.php on the remote machine). When
remote files functionality is allowed it becomes a nightmare. This is
completely non intuitive.

A lot of people blame programmer’s for the code they write, I personally
feel that if a language makes it hard for a programmer to write good code
(particularly by being counterintuitive) the language must itself take some
of the blame for the situation. It’s not good enough to just say the
programmer should know better. In almost every PHP application I’ve audited
the programmer’s have _tried_ to get it right and only been let down by
their understanding of the intricacies of PHP. In its search for the
ultimate functionality PHP has undermined the programmer’s ability to
understand the workings of their code in all situations.

“I have all the facts in my journal, and the public shall know them” – John

— < 12. Other > ———————————————————–

This is just a section for various other resources.

At a time when I thought no-one else was interested in PHP security, a few
great posts/advisories/papers have popped up:
– Rain Forest Puppy
RFP 2101 – “RFPlutonium to fuel your PHP-Nuke”
– Jo�o Gouveia
Many posts to Bugtraq, check them all out, but as a selection
– Jouko Pynnonen

There are many others, sorry I didn’t list them all.

SecureReality have released a number of advisories regarding PHP
applications which should serve to illustrate the problems I’ve outlined in
this paper fairly well:
– SRADV00001 – Arbitrary File Disclosure through PHP File Upload
– SRADV00003 – Arbitrary File Disclosure through IMP
– SRADV00006 – Remote command execution vulnerabilities in phpGroupWare
– SRADV00008 – Remote command execution vulnerabilities in phpMyAdmin and
– SRADV00009 – Remote command execution vulnerabilities in phpSecurePages
– SRADV00010 – Remote command execution vulnerabilities in SquirrelMail
– SRADV00011 – Remote command execution vulnerabilities in WebCalendar

The last four were presented during my speech at the BlackHat Briefings in
Singapore and Asia in 2001. Audio/Video of the speech will (at some stage)
be available at For anyone interested in security,
I can’t suggest more strongly that you go to the briefings.

Finally, incase anyone wondered where the title came from and all those
quotes at the end of each section, they’re from the short story “A Study In
Scarlet” by Sir Arthur Conan Doyle which was also the first story in which
the character Sherlock Holmes appeared.

“I must thank you for it all. I might not have gone but for you, and so have
missed the finest study I ever came across: a study in scarlet eh?” –
Sherlock Holmes