Securing your site against code injections

All Internet applications have to secure their inner workings against attacks from outside. We all know sites that were successfully attacked and modified, often due to some inventive usage of input parameters. The challenge is to prevent that happening to your website.

How to solve this problem? Here I show a well known solution, filtering all input per default. Making it harder to get to the unfiltered input.

This solution is now part of anyMeta.

Tainted values and PHP

In Ruby we know the concept of tainted variables. Variables are untrusted until mentioned otherwise. An example:

require 'cgi';

$SAFE = 1

cgi = CGI::new("html4")

expr = cgi["field"].to_s

if expr =~ %r{^-+*/\d\seE.()*$}
  expr.untaint
  result = eval(expr)
  # display result back to user...
else
  # display error message...
end

Looks ok, on first sight. However I am a bit worried about the .untaint method. Who can be sure that I didn't make a slight error in the filter? Or who protects against lazy programmers, untainting all variables per default? Of course Ruby protects against this behaviour with the safety levels. In PHP we don't have anything similar.

What to do for PHP?

A solution for PHP is the filter extension developed by Rasmus Lerdorf and Derick Rethans. When enabled, and installed, this extension will populate the _POST, _GET and other superglobals with filtered data. Access to the unfiltered data is done using a special API.

Of course, here also, we can't protect against dangerous sloppy programmers who dig in and always get the unsafe-raw-very-dangerous-dont-use-this data and skip the input filters. We can't protect against these people.

I think the filter solution is brilliant, and a good solution for a problem that is in need to be solved.

Why don't I use it?

Well.. we have to host on different machines. We don't always control the configuration of those servers. So when we rely on the filter extension, how do we protect our site when the filter extension is not there?

I will paint the solution we reached, relying on new object oriented functionalities of PHP 5.

Wrapping the super globals

The idea starts with replacing the super globals (_GET, _POST etc.) with object wrappers. The object wrappers behave like arrays, implementing iterators and indexing. All array values are objects. Per default all values will be filtered to strings. When you need the raw or other formatted version of a variable, then you need to use special methods.

When we have the wrappers we could simply overrule the super globals:

$_GET  = new TaintedArray($_GET);

// This will echo a filtered value of the argument
echo $_GET['q'];

// The TaintedArray wrapper behaves as an array:
echo 'Does index "a" exists? ', array_key_exists('a',$_GET) ? 'Yes' : 'No';

On entering the url: test.php?q=<b>Hello</b> our script will just echo Hello. Effectively preventing the html injection.

Introducing TaintedValue and TaintedArray

We will need two classes, one to wrap around a single value, and another to wrap around arrays. We will call them respectively TaintedValue and TaintedArray.

The TaintedValue object will also have methods to fetch filtered versions of the wrapped data. They are the asSomething methods below. The asSomething methods return false when the wrapped value doesn't confirm to the kind of value you are requesting.

class TaintedValue
{
    const ALLOWQUOTES    = 1;
    const ALLOWHTML      = 2;
    const CHECKDOMAIN    = 4;
    const SCHEMEREQUIRED = 8;
    const HOSTREQUIRED   = 16;
    
    protected $safe;
    protected $raw;
    
    public function __construct ( $value )

    public function asFilepath ()
    public function asFilename ()
    public function asText ( $flags = 0 )
    public function asLine ( $flags = 0 )
    public function asNumber ()
    public function asInt ()
    public function asBoolean ()
    public function asRegexp ( $regexp )
    public function asRegexpReplace ( $regexp, $replace = '' )
    public function asUrl ( $flags = 0 )
    public function asEmail ( $flags = 0 )

    public function filterUrl ( $flags = 0 )
    public function filterEmail ()

    public function __tostring ()
    public function get ()
    public function getRawUnsafe ()
}

A TaintedArray is used to store TaintedValue objects. The TaintedArray must behave like an array, otherwise we loose the transparent application of our wrappers. We derive almost an almost complete transparent replacement by extending the ArrayObject class of PHP 5. The only thing we need is some knowledge about tainted values, so that we are able to access the data in different ways.

class TaintedArray extends ArrayObject
{
    protected $raw;
    protected $tainted;
    
    public function __construct ( $array )

    public function __get ( $key )
    public function __set ( $key, $value )

    public function get ( $key )

    public function offsetSet ( $key, $value )
    public function offsetGet ( $key )
    public function exchangeArray ( $array )
    public function offsetUnset ( $key )
    public function append ( $value )

    public function getRawUnsafe ( $key )
}

I added the __set and __get methods so that we can access the stored TaintedValue objects as if they are attributes of the tainted array. The dangerous method is getRawUnsafe, this method returns the stored data as-is, unfiltered and completely filled with all injection data you can think of.

A more complete example demonstrates what we can do with the Tainted objects.

$_REQUEST  = new TaintedArray($_REQUEST);

// Even the keys are now safe to echo!
foreach ($_REQUEST as $key=> $value)
{
    echo '[',$key, '] = "' . $value . '" ';
    echo 'raw="', nl2br(htmlspecialchars($_REQUEST->$key->getRawUnsafe())), '"<br/>';
}

// Get two request vars, line and url, make sure they are what the names suggest..
$line  = $_REQUEST->line->asLine())
$url   = $_REQUEST->url->asUrl(TaintedValue::SCHEMEREQUIRED|TaintedValue::CHECKDOMAIN);

echo '<br/>line: "', nl2br(htmlentities($line)), '"';
echo '<br/>url: "', nl2br(htmlentities($url)), '"';

TaintedArray is almost an array

We have now some wrappers. Are we done and over with it? Are they completely transparent for our existing code?

Almost. We need one change...

$_REQUEST  = new TaintedArray($_REQUEST);

if (is_array($_REQUEST))  echo "array! ";
if (is_a($_REQUEST, 'ArrayObject')) echo "ArrayObject";

This will echo ArrayObject, because an object is not an array. (Yes, the PHP developers know about this one, it is a feature, not a bug!). So we need a simple wrapper:

function any_is_array ( $a )
{
    return is_array($a) || (is_object($a) && is_a($a, 'ArrayObject'));
}

Now, replace all your is_array() calls with any_is_array() and you are up and running with our Tainted objects!

Building the code for TaintedValue and TaintedArray

We start with a simple wrapper around a value:

class TaintedValue
{
    protected $safe;
    protected $raw;

    public function __construct ( $value )
    {
        $this->raw  = $value;
        $this->safe = some_strict_filter($value);
    }

    public function __tostring ()
    {
        return $this->safe;
    }

    public function get ()
    {
        return $this->safe;
    }

    public function getRawUnsafe ()
    {
        return $this->raw;
    }
}

The TaintedValue objects are stored inside a TaintedArray object, which simulates the behaviour of the super global arrays.

class TaintedArray extends ObjectArray
{
    protected $raw;
    protected $tainted;
    
    public function __construct ( $array )
    {
        parent::__construct(array());

        $this->raw = array();
        foreach ($array as $key => $value)
        {
            $this->offsetSet($key, $value);
        }
    }

}

In the $raw property we will store the raw data, in case our script needs to have access to the unfiltered data. In the $tainted property we store the values wrapped in a TaintedValue or TaintedArray object. Finally, in the ObjectArray we store the filtered version of all the arguments, so that when we request a variable we always get a filtered version, except when we do something special.

Now we can add the offsetSet method for initializing our array, it is pretty straight forward, though we need some extra methods for filtering the array keys and wrapping the array values:

    public function offsetSet ( $key, $value )
    {
        $val = $this->safeValue($value);
        $k   = $this->safeKey($key);

        $this->raw[$k]     = $value;
        $this->tainted[$k] = $val;

        if (get_class($val) == 'TaintedValue')
        {
            parent::offsetSet($k, $val->get());
        }
        else
        {
            parent::offsetSet($k, $val);
        }
    }

    protected function safeKey ( $key )
    {
        if (is_numeric($key))
        {
            $k = $key;
        }
        else
        {
            $k = preg_replace('/[^a-zA-Z0-9_\-\.]/', '_', $key);
        }
        return $k;
    }
    
    protected function safeValue ( $value )
    {
        if (is_object($value))
        {
            $class = get_class($value);
            if ($class != 'TaintedValue' && $class != 'TaintedArray')
            {
                trigger_error('only tainted objects allowed', E_USER_ERROR);
            }
            $val = $value;
        }
        else if (is_array($value))
        {
            $val = new TaintedArray($value);
        }
        else
        {
            $val = new TaintedValue($value);
        }
        return $val;
    }

When accessing information we stored in our arrays we need to distinguish between the safe filtered values, the wrapped tainted values and the unwrapped raw data. The last one is the one we want to stay away from our code.

    public function __set ( $key, $value )
    {
        $this->offsetSet($key, $value);
    }

    public function __get ( $key )
    {
        return $this->get($key);
    }
 
    public function get ( $key )
    {
        if (isset($this->tainted[$key]))
        {
            return $this->tainted[$key];
        }
        else
        {
            trigger_error('unknown index ' . htmlspecialchars($key), E_USER_NOTICE);
        }
    }

    public function getRawUnsafe ( $key )
    {
        return $this->raw[$key];
    }

Now we just have to fill in the other methods to keep our three arrays in sync.

The methods in the TaintedValue object are pretty straight forward. Maybe with exception of the Email and Url filters, though example code can easily be found on the Internet.

downloads

code php anymeta technical download opensource gpl oauth mediamatic memcached

OAuth Server And Consumer in PHP

Here is the full implementation of OAuth for anyMeta. For now we...

mmcached - a hierarchical extension to memcached

There is a new and better version of this memcached clone. Please check ou...

Depcached - memcache(d) with dependencies

When using memcache we bumped into some problems. The major one was that we needed to invalidate...

Verso Wiki: translate Wiki markup to HTML and HTML to Wiki markup

Verso Wiki is a Wiki to HTML and HTML to Wiki markup translator. We use this W...

oauth-php - Google Code

A PHP library for OAuth consumers and servers. Complete with an extensible OAuth store, includi...

Facebook Developers | Thrift

Thrift is a software framework for scalable cross-language services development. It combines a po...

OAuth - Added Body Signing

I just published a new version of our OAuth server and consumer code. The major addition to thi...

The PHP Metadata Toolkit

The PHP JPEG metadata toolkit is a rather complete library to read the EXIF data from digital cam...

Access Control for anyMeta

In anyMeta we have a very powerful access control mechanism. It builds on top of ...

mediamatic anymeta download opensource code technical gpl oauth release php

Working towards anyMeta release 3.1.3

After seeing the BiD Network competition come to a succes...

Depcached - memcache(d) with dependencies

When using memcache we bumped into some problems. The major one was that we needed to invalidate...

oauth-php - Google Code

A PHP library for OAuth consumers and servers. Complete with an extensible OAuth store, includi...

OAuth - Added Body Signing

I just published a new version of our OAuth server and consumer code. The major addition to thi...

oauth download code anymeta opensource technical gpl php mediamatic library

OAuth Server And Consumer in PHP

Here is the full implementation of OAuth for anyMeta. For now we...

oauth-php - Google Code

A PHP library for OAuth consumers and servers. Complete with an extensible OAuth store, includi...

OAuth - Added Body Signing

I just published a new version of our OAuth server and consumer code. The major addition to thi...

security anymeta taintedarray bugs gallery iphoto release tainted mediamatic trust

Working towards anyMeta release 3.1.3

After seeing the BiD Network competition come to a succes...

Access Control for anyMeta

In anyMeta we have a very powerful access control mechanism. It builds on top of ...

MarcWorrell.com/ created on 2006-04-06 22:17:57/ modified on 2007-08-31 16:49:21/ mail me at