Back to the article list

20 Dec 2013

Counting Lines

SensioLabsInsight counts the number of lines of code in the methods and classes of your projects. For instance, one of the rules we check is that your controllers are not too fat. We need the number of lines of code of controller methods for that. Another rule checks the overall class size, to check that the code is decoupled and follows the principle of least responsibility.

Our first approach to count the number of lines of code was naive. This brought us a lot of feedback about unfair "Symfony controller action method should not be too long", or "PHP classes should be short" violations. What if the code had a lot of comments? What if your developers followed a coding standard adding more newline characters than PSR-2?

We've recently released a new line counting algorithm, but before explaining it, let's look at some code.

<?php
/**
 * @Method("GET")
 * @Route(
 *     "/books/{bookId}",
 *     name="project"
 * )
 */
public function bookAction($bookId)
{
    $book = $this->getDoctrine()->getManager()
        ->getRepository('Book')
        ->find($bookId);

    if (!$book) {
        throw $this->createNotFoundException('Book not found.');
    }

    return array(
        'book' => $book
    );
}

How much lines of code do you think this method has? 12? 14? 21? Let's look at the same code, written differently:

<?php
/**
 * @Method("GET")
 * @Route("/books/{bookId}", name="project")
 */
public function bookAction($bookId) {
    $project = $this->getDoctrine()->getManager()->getRepository('Book')->find($bookId);
    if (!$book) { throw $this->createNotFoundException('Book not found.'); }
    return array('book' => $book);
}

It's the exact same code, yet it has only 5 lines of code without comments, 9 with comments. And what about this version?

<?php
/**
 * @Method("GET")
 * @Route(
 *     "/books/{bookId}",
 *     name="project"
 * )
 */
public function bookAction($bookId)
{
    /** @var $book \Library\Model\Book */
    $book = $this->getDoctrine()
        ->getManager()
        ->getRepository('Book')
        ->find($bookId)
    ;

    if (!$book) {
        // FIXME: redirect to the book creation page instead
        throw $this->createNotFoundException('Book not found.');
    }

    return array(
        'book' => $book
    );
}

Now you understand why it's a difficult problem. Ignoring comments and newlines isn't a big deal, but counting lines in a consistent manner across all coding styles can't just be done in a simple way. Even using third-party line counter scripts won't help us, since they will count the three extracts above in a different way.

SensioLabsInsight already parses your entire codebase to analyze it. The analysis engine turns PHP scripts into an Abstract Syntax Tree (AST), an object-oriented representation of all the statements, variables, and expressions in your code. That means that during the analysis, our engine has a normalized version of the code at hand. The only thing that we need to do is to serialize this AST in a consistent manner, and count the lines of the resulting string.

For the code extract above, the normalized version serialized by the Insight engine looks like this:

<?php
public function bookAction($bookId)
{
    $book = $this->getDoctrine()->getManager()->getRepository('Book')->find($bookId);
    if (!$book) {
        throw $this->createNotFoundException('Book not found.');
    }
    return array('book' => $book);
}

Whatever coding style is used, Insight will always count 8 lines of code for this piece of code. So don't be surprised if the line counts displayed in a violation don't exactly reflect the actual number of lines in your code. Don't be surprised either if a few violations disappear during your next analysis: this new counting algorithm generally counts less lines than before, and therefore less methods and classes cross the treshold of our "method too long" or "class too long" violations.

We'll be tweaking our serialization algorithm to better match PSR-2 in the future. And rest assured that when Insight count the lines in your code, it does it in a fair way.

comments powered by Disqus