Introducing Elastically, our Elastica Ally

Sorry for the pun 😅

In March, I got the chance to share my knowledge about Elasticsearch and PHP with hundreds of developers at Symfony Live Paris. While building this talk, I tried to make sense of all the PHP implementations I came across, either while auditing third party applications or building from scratch for our clients.

In this article, I would like to introduce Elastically, a thin wrapper on top of Elastica we use to bootstrap our Elasticsearch implementations.

Building PHP and Elasticsearch application

When a project needs Elasticsearch, most of the time we build our own indexing and search components on top of Elastica. This library is really convenient as it exposes every Query DSL clause and API endpoint as PHP classes, and is very well maintained. Our experience also made us consider some good practices that we impose on ourselves from now on.

Do not tie mapping and document together

The JSON document you send to Elasticsearch and the actual Mapping – the fields in Lucene – should not be correlated.

The JSON document should contain:

  • the data needed for search;
  • the data needed for the view, the manipulation, etc.

But the Mapping only needs one:

  • the data needed for search.

As an example, if you index a product:

{"name": "WashWash 3000", "picture": "https://cdn.example.com/toothpaste-cropped.jpg"}

You need the picture for display obviously, so it makes sense to have it in JSON. But you should not index this field, because you are never going to search product by picture! And guess what, by default Elasticsearch will index this data.

So firstly, you should not use the dynamic mapping as it’s a very good way to compromise data and store useless data in Lucene.

Secondly, your Mapping should only consist of one field, the name. So it has to be explicitly written, and is not the same as the data structure.

In Elastically, this is the default behavior.

Use YAML instead of JSON or array for configuration

Elasticsearch Mappings are JSON formatted – but as humans, writing JSON is just a massive pain.

{
  "settings": {
    "number_of_replicas": 1,
    "number_of_shards": 3,
    "analysis": {},
    "refresh_interval": "1s"
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english"
      }
    }
  }
}

In Elastica, we can setup index mapping as an Array. PHP Array are verbose and not that much easier to write and maintain:

[
    'settings' => [
        'number_of_replicas' => 1,
        'number_of_shards' => 3,
        'analysis' => [],
        'refresh_interval' => '1s',
    ],
    'mappings' => [
        'properties' => [
            'title' => [
                'type' => 'text',
                'analyzer' => 'english',
            ],
        ],
    ],
];

So what we do now is always use YAML. This format has some downsides but also lots of perks:

  • comments;
  • anchor and merge (article in French) to reuse parts of the configuration in multiple places;
  • support in IDE…
settings:
  number_of_replicas: 1
  number_of_shards: 3
  analysis: {}
  refresh_interval: 1s
mappings:
  properties:
    title:
      type: text
      analyzer: english

In Elastically, the use of YAML is forced.

Use DTO: Data Transfer Object

On top of Elastica, we add some logic to write and read DTO in Elasticsearch, instead of plain old array.

The advantages are:

  • The code is easier to read and manipulate;
  • It’s closer to what we already do with Doctrine ODM;
  • Interoperability with other storage is easier to manage;
  • Data is always consistent and we can pass the DTO as type-hinted arguments, there is no need to guess from an associative array.

As Elastica only talks JSON or array, Elastically introduce a custom Indexer and ResultBuilder allowing to pass and retrieve PHP objects (via a Serializer).

Indexes should be versioned

When talking to an Index, we do it via its name. That’s good, unless we want to update the mapping of that index, because we have to rebuild it. To avoid downtime, we use aliases on top of our indexes.

In Elastically, this is forced and transparent.

Tools for better integration

Some tools are also implemented (or on their way!) to ease application development:

  • The Indexer: allowing to use the Bulk API properly;
  • (TBD) A reindexing command: leveraging the Reindex API to rebuild your entire index automagically when you update your Mapping configuration (think about deployment);
  • (TBD) An updater helper to ease real-time updates even when the reindexing command is building the “next” index;
  • (TBD) A custom healthchecker: allowing you do get tailor-made insight about your cluster health (Is there enough document in that index?)…

How to use?

Elastically is not released yet as I still want to add some features, but you can already use it for the core functionalities (DTO, Indexer…).

composer require "jolicode/elastically:dev-master"

Then you can use JoliCode\Elastically\Client instead of Elastica Client; they are 100% compatible as it’s just a parent class.

// Building the Index from a mapping config
use JoliCode\Elastically\Client;
use Elastica\Document;

// New Client object with new options
$client = new Client([
    // Where to find the mappings
    Client::CONFIG_MAPPINGS_DIRECTORY => __DIR__.'/configs',
    // What object to find in each index
    Client::CONFIG_INDEX_CLASS_MAPPING => [
        'beers' => App\Dto\Beer::class,    
    ],
]);

// Class to build Indexes
$indexBuilder = $client->getIndexBuilder();

// Create the Index in Elasticsearch
$index = $indexBuilder->createIndex('beers');

// Set proper aliases
$indexBuilder->markAsLive($index, 'beers');

// Class to index DTO in an Index
$indexer = $client->getIndexer();

$dto = new Beer();
$dto->bar = 'American Pale Ale';
$dto->foo = 'Hops from Alsace, France';

// Add a document to the queue
$indexer->scheduleIndex('beers', new Document('123', $dto));
$indexer->flush();

// Force index refresh if needed
$indexer->refresh('beers');

The Serializer

By default, Elastically will leverage the ObjectNormalizer from Symfony to transform your DTO to an array. That’s easy and fast, but you can also setup your own.

At JoliCode, we use Jane PHP to generate super fast Normalizer based on a JSON Schema. We can declare our Model and Jane generate the PHP code: the DTO, the Normalizer and a factory.

Less time on the basics, more time on the business value

Elastically is not meant to be a fully feature implementation like FOSElasticaBundle for example. I want it to be an opinionated framework to build Elasticsearch based feature in PHP application.

I would be glad to hear about different approaches when dealing with Elasticsearch via PHP, so feel free to compare and share your experiences!

Code is available on Github as always: https://github.com/jolicode/elastically

Nos formations sur le sujet

  • Elasticsearch

    Indexation et recherche avancée, scalable et rapide avec Elasticsearch

blog comments powered by Disqus