01/02/2022

5min.

(Re)discover XPath selectors

Cet article est aussi disponible en 🇫🇷 Français : (Re)découverte des sélecteurs XPath.

Not everybody likes to write tests, but everybody loves having them. This is not an article to convince you to write them.

Section intitulée basicsBasics

We have different types of tests:

Unit tests, which ensure that a piece of code works as expected;
Integration tests, which combine several classes working together;
Application or functional tests, for the behavior of a complete application. They use HTTP requests and test the returned response.

Today we are going to talk about application tests on a classic (non-API) application/website.

Section intitulée the-comfort-zoneThe comfort zone

Today, this is how we do it:

We create a request, and we have great tools for that;
We interact with the page (for example by clicking on a link or sending a form), we have great tools for that too;
We test the answer, again, with great tools for that, especially the Symfony CssSelector component 💙

There is no doubt that the CSS selector component is nice to use. As a web developer, CSS speaks to us, and it feels natural to use this method to find and validate the presence of HTML elements in our responses.

So why bother, what is the problem?

Section intitulée raiders-of-the-lost-selectorRaiders of the Lost Selector

Today we are going to rediscover the interest and the power of XPath selectors. Indeed, with the different ways of writing modern CSS, via frameworks (tailwindcss), or by using utility classes rather than semantics (OOCSS, BEM), targeting particular elements becomes increasingly difficult.

So it stays true that writing (and reading) CSS selectors is easier, but it’s also less powerful.

Even if XPath continues to scare, we will see that it is far from justified, and that we can even have fun.

Section intitulée demystifying-xpathDemystifying XPath

XPath is a query language for selecting nodes from an XML document. This language can be applied to HTML, that’s what interests us today.

To be honest, the learning curve is a little difficult, but by advancing little by little we will appreciate its possibilities without suffering too much.

Let’s start with a first example, and remember that XPath is “only” a path, a bit like our file system.

<!DOCTYPE html>
<html lang="fr">
  <head>
    <meta charset="UTF-8" />
    <title>Symfony Demo application</title>
    <link rel="icon" type="image/x-icon" href="/favicon.ico" />
  </head>
  <body id="blog_index">
    <div class="mt-8 text-sm">
      <article class="flex mx-4">
        <p>This is my first p in Article</p>
        <p>This is a second P in Article</p>
      </article>
    </div>
  </body>
</html>

With this document, the expression: /html/body/div/article/p
Returns:

<p>This is my first p in Article</p>
<p>This is a second P in Article</p>

Just as for a directory on our file system, the path must be exact, if our <div> was itself in another <div> we would have had no results.

Simple, isn’t it?

Now let’s avoid having to specify the entire path to find our node.

The expression: //div/article/p returns the same nodes.

We see that // saves us from starting from the root node /. The expression asks for all paths that contain p in article in div or, by saying it the other way around, div containing article containing p.

All paths? Yes.

The analogy with the file system ends there, because in XML/HTML you can have several equal paths.

Example

<!DOCTYPE html>
<html lang="fr">
  <head>
    <meta charset="UTF-8" />
    <title>Symfony Demo application</title>
    <link rel="icon" type="image/x-icon" href="/favicon.ico" />
  </head>
  <body id="blog_index">
    <div class="mt-8 text-sm">
      <article class="flex mx-4 art-first">
        <p>This is my first p in first Article</p>
        <p>This is a second P in first Article</p>
      </article>
      <article class="flex mx-4 art-second">
        <p class="my-p">This is my first p in second Article</p>
        <p>This is a second P in second Article</p>
      </article>
    </div>
  </body>
</html>

This expression: //div/article/p
Returns: All 4 <p>.

This looks quite simple, what about real life?

Section intitulée filter-or-select-node-position-indexFilter or select node position/index

//article[2] (2nd article, index starts at 1)
//article[last()] (last article)

Section intitulée filter-or-select-by-attributesFilter or select by attributes

//body[attribute::id='blog_index']
//body[@id='blog_index'] (the @ is syntactic sugar)
//p[@class='my-p']

Section intitulée xpath-functionsXPath Functions

Let’s continue our exploration with examples. Here, because the class attribute consists of text that includes multiple class names, the expression: //article[@class='art-first'] will not work.
Indeed, we don’t have <article class="art-first"> but <article class="flex mx-4 art-first">. It is therefore necessary to use XPath functions.

These are utility functions included in the language that allow you to work on your expressions. In our example, we will use contains.

Our expression becomes:
//article[contains(@class, 'art-first')]

Section intitulée fetch-node-content-and-not-the-node-itselfFetch node content and not the node itself

//article[2]/p[1]/text() gives the content of the first paragraph of the second article.

Here is a concrete case of a selector used in a project:

$this->assertEquals(1, $crawler->filterXPath('//a[starts-with(@href, "/blog/post/")]')->count());

Section intitulée use-in-functional-testingUse in functional testing

The integration of XPath filters into the Symfony test framework is directly supported through the DOMCrawler component.

Here is how it can be used:

public function testHomepageHasLinkToBlog(): void
{
    $client = static::createClient();
    $crawler = $client->request('GET', '/en/homepage');
    $selector = '//a[starts-with(@href, "/en/blog")]';

    $this->assertEquals(1, $crawler->filterXPath($selector)->count());
}

Note that DOMCrawler::filterXPath() returns an instance of DOMCrawler, so you then have access to several methods that allow you to filter your node list again. It is even possible to add business logic to this list of nodes in your tests if necessary.

Example: check that all the children selected by the XPath themselves respect a structure, etc.

$crawler->filterXPath('//a[starts-with(@href, "/post/")]')->each (function ($node) {
    // Your logic here
})

Section intitulée conclusionConclusion

It is only a quick look at the basics of XPath selectors, the goal is not to make an exhaustive description of them, but to remind you that they exist and that they are very powerful.

Even if their syntax is unusual, or just new to you, don’t be afraid.

On the contrary, I found out it is often quite satisfying to find the right selector, and as usual we are helped by good quality tools.

In browser developer tools, there is usually a context menu on an HTML element to retrieve its XPath directly.
Screenshot web developer tools

And also the nice xpather.com which is a very powerful tool and will help you to test and find the selector of your dreams (yes, no more than that).