Unaccented Name Search with Postgres and Ecto

Hey folks, welcome back to another blog post!

Did you know that:

When you go into the bathroom, you're American.
When you come out of the bathroom, you're American.
But what are you IN the bathroom?

European!

Anyways, today’s blog post touches on a topic that I’ve written about plenty already. It’s about efficient name search in Postgres, but with a twist!

The Problem

In previous articles, I wrote about how to do an efficient name search in Postgres using GIN indexes. These work great, but once your user base becomes international, you have to deal with all the weird things other cultures add to their words. Like adding little ticks, arrows, dots, and waves above them, like José, être, Blömschen, and São.

The problem with these special symbols is that they break your text search. For example, if I want to search for José in my application, I might only type Jose (without the accent). But Postgres treats é and e differently and won’t return a user called José. For Postgres, Jose and José are not the same.

This is bad UX and not what we want. We want to return José even though the search term was Jose, so what can we do?

The Solution

Luckily, Postgres offers a function called unaccent for exactly this situation. It allows Postgres to treat words with and without accent equally (e.g. é and e). Let’s see how we can use it in Ecto.

The Setup

This is our search function without unaccent:

def search(name) do
  query =
    from(
      p in Person,
      where: ilike(p.name, ^search_term)
    )

  Repo.all(query)
end

This function won’t return a person with the name José if we search for Jose. Likewise, it won’t return a person with the name Jose (no accent) if we search for José (with an accent). Let’s fix the function.

The Migration

First, we need to add a new extension to Postgres. Add the following to a migration of your choosing:

  def change do
    execute(
      "CREATE EXTENSION IF NOT EXISTS unaccent;",
      "DROP EXTENSION IF EXISTS unaccent;"
    )
  end

If you run the migration, it will add the unaccent extension to your Postgres instance. From now on, we can use the function unaccent in our query.

The slow Solution

Now, let’s update our query function from above. We simply have to wrap the person’s name and the search term with unaccent and Postgres will take care of the rest. This is how our search function looks now:

def search(name) do
  query =
    from(
      p in Person,
      where: fragment(
        "unaccent(?) ILIKE unaccent(?)",
        p.name,
        ^search_term
      )
    )

  Repo.all(query)
end

From now on, our search/1 function will also return a person called José when we search for Jose, jose, José, or josé. It will also return a person called Jose (unaccented) if we search for Josè (accented). Great success!

The Fast Solution

Sadly, there is one problem with our solution above: It’s slooooooooow 🐌.

If you have read my post about efficient name search, you know that we can speed up our query significantly by adding a GIN index to it. Let’s add a GIN index but for the unaccented version of the names in our database.

First, we need to create an immutable version of Postgres’ unaccent function. If you use a function in a Postgres index, it must be IMMUTABLE. That means that the function cannot alter the database and will always return the same value given the same arguments.

The unaccent function is not IMMUTABLE, but SOLID. That means that it can’t change the database and will always return the same result given the same arguments, but its result might change across SQL statements. Imagine that I want to fetch the current time. That function would rely on the timezone of my database. If I fetch the time once, change my database timezone, and fetch the time again, the results will change. The same goes for the unaccent function because it depends on the dictionary that I use in my database. If I change my dictionary, the result of unaccent might change, and therefore the function is only SOLID and not IMMUTABLE.

That means that we can’t use unaccent directly when creating the index. However, we can create an IMMUTABLE version of the function if we’re certain to never change the dictionary of our database (which you probably won’t anyways). Let’s do that now.

In our migration after your CREATE EXTENSION ... call, add this:

  execute(
    """
    CREATE OR REPLACE FUNCTION f_unaccent(text)
        RETURNS text
        LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
      $func$
        SELECT public.unaccent('public.unaccent', $1)
      $func$;
    """,
    "DROP FUNCTION IF EXISTS f_unaccent(text);"
  )

We have now an immutable function called f_unaccent which we can use to generate a GIN index of our person.name field. Add this after the snippet above:

execute(
  "CREATE EXTENSION IF NOT EXISTS pg_trgm;",
  "DROP EXTENSION IF EXISTS pg_trgm;"
)

execute(
  """
  CREATE INDEX persons_name_unaccented_gin_trgm_idx
    ON persons
    USING gin (f_unaccent(name) gin_trgm_ops)
  """,
  "DROP INDEX IF EXISTS persons_name_unaccented_gin_trgm_idx;"
)

The first statement will add the Postgres Trigram extension to your Postgres database. We need this extension for creating the GIN index in the second statement.

The second statement creates a GIN index for the unaccented value in our person.name field. That means that our search/1 query above will now use a much faster GIN index for searching for the correct person.

As a last step, we need to update our search/1 function to use the new f_unaccent(text) function instead of the built-in unaccent function. This is the updated version of our function:

def search(name) do
  query =
    from(
      p in Person,
      where: fragment(
        "f_unaccent(?) ILIKE f_unaccent(?)",
        p.name,
        ^search_term
      )
    )

  Repo.all(query)
end

If you want to double-check that your function uses the new index, add this code to the search/1 function:

def search(name) do
  query = # build your query

  # Add this 👇
  Ecto.Adapters.SQL.explain(Repo, :all, query) |> IO.inspect()

  Repo.all(query)
end

When you run the search/1 function, you will see something like this in your terminal:

Bitmap Heap Scan on persons p0
  Recheck Cond: (f_unaccent((name)::text) ~~* '%jose%'::text)
    ->  Bitmap Index Scan on persons_name_unaccented_gin_trgm_idx
        Index Cond: (f_unaccent((name)::text) ~~* '%jose%'::text)

If you see this, congratulations! Your search function now uses the GIN index for searching for the unaccented version of the name.

The Caveats

As always, there are caveats to this approach. If you’re interested into the nitty-gritty details of how this works and what other solutions exist, I can recommend the following resources:

Conclusion

And that’s it! I hope you enjoyed this article! If you want to support me, you can buy my book or video course. Follow me on BlueSky or subscribe to my newsletter below if you want to get notified when I publish the next blog post. Until the next time! Cheerio 👋