Different ways from preventing form resubmission

HTTP is stateless, every request is independent from the previous one, when the connection is established there is a request, a respond and the connection finishes. That’s all. One common problem in web programming is when the user, after drinking more coffee than he can remember, compulsively and possessed by an extreme urge, clicks repeatedly the send button until the mouse wants to commit suicide. Then, the server receives all these requests without knowing if the user (the client) is sending the same information again and again. Another similar situation is when users, after submitting data to the server via a web form, decide they want to come back and click the browser’s previous button. Then, the browser pops up a message saying that they will resubmit the data again. If they don’t care and accept it, here there is our server receiving the same information again. A variant of this one is when the user refreshes the page after sending the post. (Forms that include files or server lag may trigger the user’s impatience) How to deal with those requests is something that we, as programmers, must take care of.

Now the solutions! First I will show you the way to prevent resending the form if the user navigates backwards or refreshes the page. The nicest solution is to implement two pages, let’s call them A and B. In page A the user submits the form and a Post request is send to the server. Our code must process this request and, if valid, resend it to the page B in a Get request. Let’s see server’s side code for page A for one of the nicest PHP frameworks out there, CakePHP:

public function admin_proxy() {
    $params = array();
    foreach ($this->data['User'] as $key=>$value) {
        if (!empty($value)) {
            $params[$key] = rawurlencode($value);
        }
    }
    $destination = array_merge(array('controller' => 'users', 'action' => 'admin_index'), $params);
    $this->redirect($destination);
}

First we encode the parameters before passing them to the action “admin_index”, the B page that will receive the parameters via Get. Just if you are curious and / or you aren’t used to build applications for languages with accents, this will not work for languages like Spanish, Catalan or French:

urlencode(htmlentities($value, ENT_NOQUOTES, 'UTF-8'))

The website is in Spanish, that’s why I use rawurlencode() After this, you can see a redirect in the code. This will send a HTTP 302 status code to the browser, the usual method (although 303 is more standard friendly) for performing a redirection forcing the type to Get, regardless the previous request type. Now the page B receives the parameters, decodes them, and informs the user about the operation’s results or any other logic we may want to perform:

public function admin_index() {    
    if (!empty($this->params['named'])) {
        foreach ($this->params['named'] as $key=>$value)
            $this->data['User'][$key] = rawurldecode($value);
    }
        
    /*
     * Some cool things to do here...
    */
}

If we want to do anything with the data the user sends, like store a record in the database, send an e-mail, deface NASA web page, whatever, we must do it before the redirect, in the logic for page A.

Now a fast and dirty solution for preventing the user clicking the submit button multiple times: disable the send button after the user sends it using client side code in Javascript. Please, read carefully: after the user sends it, not after clicks the button. What about if the user clicks but the form isn’t submitted because has an error the user must correct? We can’t leave it disabled or the user will have real trouble trying to send it. Let’s see a solution with jQuery and Validate library:

$(document).ready(function() {
    $('#MyForm').validate({
        rules: {
            // Some rules here.
        },
        messages: {
            // Messages for rules violation.
        },
        submitHandler: function(){
            $('input.submit').attr('disabled', 'disabled');
            $('#MyForm').submit();
        }
    });
 });

SubmitHandler will be executed only after the user clicks the send button and the form is correctly filled, only then, the function it contains will disable the submit button. As stated before, this solution is fast, but what about the user that disables Javascript? It will not work. A good solution is to use the session variable in the server side. When the form is created we generate a token and we store it both in the session and in a form’s hidden input field. When the user submits the form we check if the session has the form’s token stored, if it does and both match we delete the variable from the session and the form’s data process continues, if it doesn’t exist or they don’t match, we refuse to process the request. The token can be generated with any hash function like md5 or sha1, being md5 faster. Here there is the idea implemented in simple PHP, without any framework:

<?php
    session_start();
    $aResults = [];

    if (!empty($_POST)) {
        $token = filter_input(INPUT_POST, 'token', FILTER_SANITIZE_STRING);
        if ($token == $_SESSION['token']) {
            // Process form data
            $aResults[] = 'I love your new submit!';
            $aResults[] = 'You said: ' . filter_input(INPUT_POST, 'some-field', FILTER_SANITIZE_STRING);
        }else{
            $aResults[] = 'Seems you already sent that!';
        }
    }
    $_SESSION['token'] = uniqid(md5(microtime()), true);
?>
<html>
<body>
    <?php
        if (!empty($aResults)):
            foreach ($aResults as $k => $v):
                echo "$v <br />";
            endforeach;
        endif;
    ?>
    <form method="post">
        <input type="hidden" name="token" value="<?=$_SESSION['token'] ?>" />
        <input type="text" name="some-field" value="" />
        <input type="submit" value="Submit" />
    </form>
</body>
</html>

And what about a malicious user doing form tampering? First of all, let me explain what form tampering is: it’s the name for an attack that consists in store the form into a file and modify it for sending it again against the server, the intentions aren’t always good. Thanks to CakePHP you don’t need to worry about this, if you use the “Security” component and the “Form” helper, automagically your site is protected. Well, let me explain the magic: the Form helper will add hidden token fields and the Security component will check them. Among other things, form submissions will not be accepted after a period of time that depends on the setting of “Security.level”. The same idea can be implemented in any script for the web.

This weekend I was curious about how WordPress (3.3.1) solves this because I’m developing some projects with this CMS and I was surprised for my findings. The function wp_new_comment(), called from wp-comments-post.php calls wp_allow_comment(), both functions are located in comment.php, look what this last function does:

// Simple duplicate check
// expected_slashed ($comment_post_ID, $comment_author, $comment_author_email, $comment_content)
$dupe = "SELECT comment_ID FROM $wpdb->comments WHERE comment_post_ID = '$comment_post_ID' AND comment_approved != 'trash' AND ( comment_author = '$comment_author' ";
if ( $comment_author_email )
    $dupe .= "OR comment_author_email = '$comment_author_email' ";
$dupe .= ") AND comment_content = '$comment_content' LIMIT 1";
if ( $wpdb->get_var($dupe) ) {
    do_action( 'comment_duplicate_trigger', $commentdata );
    if ( defined('DOING_AJAX') )
        die( __('Duplicate comment detected; it looks as though you’ve already said that!') );

    wp_die( __('Duplicate comment detected; it looks as though you’ve already said that!') );
}

Is doing a query against the database, looking into the table “comments” if the content the user is commenting (post) has any comment not in the trash that belongs to that user (using the author name or the email if set) and the content (the comment’s body) is exactly the same. It works and prevents both multiple clicking and repeating yourself two years later, but what about performance? Any comment a user does requires this overload? If we examine the table comments we can see that the field comment_content type is “text”. I’m not saying this query is slow, but it’s necessary? Databases are usually the bottleneck in websites, therefore the philosophy of avoiding as much as possible queries is great. Furthermore, WordPress sites are usually in shared servers and don’t expect great performance there… But WordPress has come a long way and sure their contributors are experienced developers therefore I guess they have a good reason for doing it this way and I would like to know. Anybody can bring me some light about this?

No es recursivo todo lo que parece.

En la entrada anterior hablé sobre la recursividad mediante Scheme y la función de Ackermann. Hay algo curioso, o al menos a mi me sorprende, sobre Scheme y es como interpreta como iterativos algoritmos aparentemente recursivos. Como ejemplo 2 algoritmos para calcular los números de Fibonacci, uno recursivo y el otro iterativo aunque parezca recursivo. Primero recordar que estos números de definen así:

0 Si n = 0

1 Si n = 1

Fib(n – 1) + Fib(n – 2) Si n ≠ 0 y n ≠ 1

Esta definición lleva directamente a este algoritmo recursivo en Scheme:

(define (fib n)
    (cond ((= n 0) 0)
    ((= n 1) 1)
    (else (+ (fib (- n 1))
        (fib (- n 2))))))

Funciona pero es poco eficiente, la complejidad aumenta exponencialmente a medida que el número que buscamos, n, es mayor, pues se trata de recursividad en árbol. Este otro algoritmo es mucho más eficiente:

(define (fib n)
    (fib-iter 1 0 n))

(define (fib-iter a b count)
    (if (= count 0)
    b
    (fib-iter (+ a b) a (- count 1))))

Si queremos saber el Fibonacci para 4 lo invocamos así:

(fib 4)

Y hará:

(fib 4)

(fib-iter 1 0 4)

(fib-iter 1 1 3)

(fib-iter 2 1 2)

(fib-iter 3 2 1)

(fib-iter 5 3 0)

3

Scheme lo ejecuta como iterativo, no como recursivo. ¿Pero que sucede si implementamos ese mismo algoritmo en un lenguaje tipo procedimental como C o Pascal? Veamos como podría ser en PHP, el rey de la web:

function fib ($a, $b, $aCalc) {

    if ($aCalc == 0) {
        return $b;
    }else{
        $aCalc--;
        fib($a + $b, $a, $aCalc);
    }
}

$result = fib (1, 0, 4);
echo $result;

Y no funcionará. ¿Por qué? Pues porque PHP lo interpretará como recursivo, no como iterativo, y hará:

(fib 1 0 4)

(fib 1 0 4)(fib 1 1 3)

(fib 1 0 4)(fib 1 1 3)(fib 2 1 2)

(fib 1 0 4)(fib 1 1 3)(fib 2 1 2)(fib 3 2 1)

(fib 1 0 4)(fib 1 1 3)(fib 2 1 2)(fib 3 2 1)(fib 5 3 0)

Pero en vez de quedarse aquí empezará a hacer los “returns”, los “pops” de la pila:

(fib 1 0 4)(fib 1 1 3)(fib 2 1 2)(fib 3 2 0)

(fib 1 0 4)(fib 1 1 3)(fib 2 1 1)

(fib 1 0 4)(fib 1 1 2)

(fib 1 0 3)

:-/

Entonces, ¿como hacer “entender” al compilador o intérprete en este tipo de lenguajes que deseamos una implementación iterativa? La única opción es hacer servir las instrucciones que disponen para bucles: while, for, repeat, etc. Este podría ser un ejemplo en PHP:

function fibonacci($n) {
    $f[0] = 0;
    $f[1] = 1;

    for ($i = 2; $i <= $n; $i++) {
        $f[$i] = $f[$i-1] + $f[$i-2];
    }

    echo $f[$n];
}
fibonacci(8);

Ésta curiosidad se puede expresar también a la inversa: Scheme interpreta como iterativos algoritmos que a quienes estamos acostumbrados a los lenguajes por preocedimientos (procedural languages) nos parecen iterativos. Probablemente ésta es mejor forma de expresarlo.

La función de Ackermann

Llevo un par de meses bastante entretenido con el libro del MIT “Structure and Interpretation of Computer Programs”. Si bien el MIT nos permite el acceso gratuitamente a todo su contenido aquí, es un libro genial que realmente vale la pena comprárselo.

En uno de sus primeros capítulos, donde se contrastan los procesos iterativos con los recursivos y como aquel que no quiere la cosa, en un ejercicio aparentemente inocente aparece un procedimiento que computa una variante de la función matemática de Ackermann:

(define (A x y)
    (cond ((= y 0) 0)
    ((= x 0) (* 2 y))
    ((= y 1) 2)
    (else (A (- x 1)
        (A x (- y 1))))))

Dicha función es famosa en teoría de la computación. Lo que sorprende de dicha función, a diferencia de las que habitualmente se usan como función “modelo” para enseñar qué es la recursividad, es que no es recursiva primitiva. El ejemplo típico de la función recursiva es la que nos da el factorial de un número n:

(define (factorial n)
    (if (= n 1)
    1
    (* n (factorial (- n 1)))))

El proceso que seguiría para calcular por ejemplo el factorial de 6 sería:

factorial(6)

6 * factorial(5)

6 * (5 * factorial(4))

6 * (5 * (4 * factorial(3)))

6 * (5 * (4 * (3 * factorial(2))))

6 * (5 * (4 * (3 * (2 * factorial(1)))))

Este es el punto en que más se carga la pila, a partir de aquí serían “pops”:

6 * (5 * (4 * (3 * (2 * 1))))

6 * (5 * (4 * (3 * 2)))

6 * (5 * (4 * 6))

6 * (5 * (24))

6 * (120)

720

Las funciones recursivas “típicas” siguen éste patrón, se van sumergiendo hasta llegar a un punto de inflexión a partir del cual van emergiendo hasta aflorar el resultado final. La función de Ackermann podría parecer que transcurre igual, por ejemplo (A 1 6) sería:

(A 1 6)

(A 0 (A 1 5))

(A 0 (A 0 (A 1 4)))

(A 0 (A 0 (A 0 (A 1 3))))

(A 0 (A 0 (A 0 (A 0 (A 1 2)))))

(A 0 (A 0 (A 0 (A 0 (A 0 (A 1 1))))))

(A 0 (A 0 (A 0 (A 0 (A 0 2)))))

(A 0 (A 0 (A 0 (A 0 4))))

(A 0 (A 0 (A 0 8)))

(A 0 (A 0 16))

(A 0  32)

64

Es decir, 26 De hecho, para todo (A 1 n) el resultado siempre será 2n Pero veamos por ejemplo como transcurre al calcular (A 2 4):

(A 2 4)

(A 1 (A 2 3))

(A 1 (A 1 (A 2 2)))

(A 1 (A 1 (A 1 (A 2 1))))

(A 1 (A 1 (A 1 2)))

(A 1 (A 1 (A 0 (A 1 1))))

(A 1 (A 1 (A 0 2)))

(A 1 (A 1 4))

(A 1 (A 0 (A 1 3)))

Como se puede observar, se va profundizando para después emerger y a continuación volver a sumergirse. Después de un largo proceso recursivo, el resultado final que acaba dando es 65536, es decir, 216.