Inefficiencies due to CakePHP ORM implementation

First let me say this article is not a critic against CakePHP (except one point I will talk about later), Object Relational Mapping, far than a trivial thing, it’s a very complicated one.

Let’s begin explaining what this TLA (three-letter acronym) is 🙂 Object Relational Mapping is a programming technique for allowing objects interact with a relational database, trying to allow the programmer to create, insert, update and delete objects in the database without typing a single line of SQL. The technical difficulties are well know (by the name “Object-relational impedance mismatch”) and there is a lot of theory about them. In fact, some frameworks like CodeIgniter or Zend Framework avoid not only ORM but the models. This is not because these frameworks don’t follow the MVC pattern but because they follow a different philosophy: as it’s impossible to know how the datasource will be, better to don’t provide models and let them for the developer. They just provide some classes to interact with databases. The fact CakePHP implements ORM helps us to rapid develop an application, CRUD interfaces creation can be automatized, etc. CakePHP is RAD!

Now let’s see how the way CakePHP implements ORM creates some inefficiencies. I will follow the examples the documentation shows. If an ingredient can have many recipes and ingredients can belong to many recipes we need and intermediate table to join them, following the Normal Forms for designing the database, this intermediate table should look like:

ingredient_id, receipe_id

These two fields form the compound primary key and both have a foreign key to their respective tables. (From now on the primary key is underlined) We can add attributes for the relation: quantity or whatever. First inconvenience: CakePHP doesn’t support compound primary keys. The documentation says this relationship is “Has and Belongs to Many” and specifically:

The contents of the table should be two fields, each foreign keys (which should be integers) pointing to both of the primary keys of the involved models. To avoid any issues – don’t define a combined primary key for these two fields, if your application requires it you can define a unique index.

The inefficiencies start, following the “CakePHP way” for developing, the table should look:

id (autonumeric), ingredient_id, receipe_id

And add an unique key formed by the two fields. Despite what the documentation warns, I have a join table that follows Normal Forms (the two fields are the PK) and I still haven’t had any issues, but this is absolutely no possible when the relationship has attributes. For this cases CakePHP, following Ruby on Rails, has the relationship “hasMany through“. If students take courses, courses can be taken by many students and we want to store for every student the attended days and final grade, the table should look like:

student_id, course_id, days_attended, grade

Instead, CakePHP requires this table design:

id, student_id, course_id, days_attended, grade

To ensure the integrity we create an unique index formed by the fields student_id and course_id, because a student attends x days to a course and has only one grade for that course. When the user modifies a grade Cake doesn’t perform an update query, first deletes the record and then inserts it again with the new attributes, two queries instead of one. But it can be far worst. Imagine a situation where the user is viewing the student’s profile, for each students has a button for accessing a page to edit his/her grades. If the user edits a profile and saves the database will trigger an integrity violation error because Cake is trying to insert a new record instead of update the existing one (remember there is an unique index). This time Cake can’t automagically help us so we must type some code into the Model’s beforeSave method, something like:

public function beforeSave($options = array()) {
    if (empty($this->id)) {     // Pretends an insert.
        $conditions = array(
            'student_id' => $this->data['CoursesStudents']['student_id'],
            'course_id' => $this->data['CoursesStudents']['course_id']
        );
        $id = $this->find('first', array('conditions' => $conditions, 'fields' => 'id'));
        if (!empty($id['CoursesStudents']['id'])) {
            $this->id = $id['CoursesStudents']['id'];
        }
    }

    return true;
}

First Cake performs a query trying to know if it’s an insert or an update, then in beforeSave we perform another query to know if the record really exists, if it does we tell Cake to perform an update assigning a value to $this->id But this will not work! Before the execution arrives to the beforeSave method, CakePHP already determined if it is an insert or an update and it’s too late for us to change it. The solution is to delete the record, a slight but significant modification in the previous code:

public function beforeSave($options = array()) {
    if (empty($this->id)) {     // Pretends an insert.
        $conditions = array(
            'student_id' => $this->data['CoursesStudents']['student_id'],
            'course_id' => $this->data['CoursesStudents']['course_id']
        );
        $id = $this->find('first', array('conditions' => $conditions, 'fields' => 'id'));
        if (!empty($id['CoursesStudents']['id'])) {
            $this->id = $id['CoursesStudents']['id'];
            $this->delete();
        }
    }

    return true;

With the delete already 3 queries have been executed and with a final insert the process of updating a student’s grade will finish. Four queries instead of one makes CakePHP a bit chatty, don’t you think? Databases are usually the bottleneck in websites and we must avoid any unnecessary access. The fact that the primary key can’t be changed in beforeSave is something already mentioned as an improvement and maybe will be implemented in future versions. At the time I’m writing this the last version is 2.1 and it’s still pending. But seems that implementing compound primary keys it’s not one of CakePHP core developers objectives. Seems this feature has been requested since years ago but never came to fruition. Some argue Normal Forms aren’t efficient for highly transited sites. This debate it’s a complicated one and I would love to hear your opinions and learn!

Deja un comentario

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *

Este sitio usa Akismet para reducir el spam. Aprende cómo se procesan los datos de tus comentarios.