Optimize Ecto's insert_all with Placeholders

- 5 min read

Before we start talking about using placeholders in Ecto’s insert_all/3, I need you to understand the following:

What do you call cheese that is not yours?

Nacho Cheese

Now, with the ceremonial dad joke out of the way, let’s focus on today’s topic: What are Ecto placeholders and how to use them when inserting a lot of data?

The Problem

Imagine we want to insert a lot of book data into our database. We decide to use Repo.insert_all/3 and write a script like the following:

books = [  %{
    title: "The Hobbit",
    content: "In a hole..."
  },
  %{
    title: "The Little Prince",
    content: "Once when I was..."
  }
]

now =
  NaiveDateTime.utc_now()
  |> NaiveDateTime.truncate(:second)

data =
  Enum.map(books,
    &Map.merge(&1, %{
      inserted_at: now,
      updated_at: now
    })
  )

Repo.insert_all(Book, data)

All looks well here. We knew that Repo.insert_all/3 doesn’t add the autogenerated timestamps and added them ourselves. We run the script and receive an output like this:

INSERT INTO "books"(
  "title", "content",
  "inserted_at","updated_at"
)
VALUES
($1,$2,$3,$4),($5,$6,$7,$8)
[
  "The Hobbit",
  "In a hole...",
  ~N[2022-10-02 10:36:34],
  ~N[2022-10-02 10:36:34],

  "The Little Prince",
  "Once when I was...",
  ~N[2022-10-02 10:36:34]
  ~N[2022-10-02 10:36:34],
]

Okay, this looks good as well. But have a closer look at the values and you might start to wonder: Why do we send the same timestamps for every entry, twice?

This seems unnecessary, doesn’t it? Especially since all this duplicate data is sent to our database over the network, causing unnecessary traffic, bloating up the payload, and slowing down the entire request.

If you now ask yourself: How can we send the timestamps only once? I have good news for you. Ecto’s placeholders got your back.

The Solution

Ecto allows us to specify placeholders in our Repo.insert_all/3 request for fields that should all receive the same value. In this case, we want to insert the same timestamp in every inserted_at and updated_at field. But we want to send the timestamp to the database only once. We can achieve this by moving the timestamp to a placeholder like this:

books = [...]
now =
  NaiveDateTime.utc_now()
  |> NaiveDateTime.truncate(:second)

data =
  Enum.map(books,
    &Map.merge(&1, %{
      inserted_at: {:placeholder, :now},
      updated_at: {:placeholder, :now}
    })
  )

placeholders = %{now: now}

Repo.insert_all(
  Book,
  data,
  placeholders: placeholders
)

We replaced the timestamps with {:placeholder, :now} and provided the value for :now in the placeholders option. If we run this query, the output shows that Postgres inserts the timestamp wherever we used the :now placeholder. Have a look at the output:

INSERT INTO "books"(
  "title", "content",
  "inserted_at","updated_at"
)
VALUES
($2,$3,$1,$1),($4,$5,$1,$1)
[
  ~N[2022-10-02 10:46:04],

  "The Hobbit",
  "In a hole...,

  "The Little Prince",
  "Once when I was..."
]

Ecto replaced all occurrences of our :now placeholder with a link to the first value $1, which is our timestamp. Instead of sending the same timestamp a thousand times, we only send it once and use it wherever needed. So much bandwidth saved!

Heads up: Placeholders are only supported in Postgres, and not in MySQL.

Placeholders are particularly useful when we insert a large chunk of data into multiple fields or across multiple entries. For example, imagine that we want to duplicate a book’s content for multiple users. Every user wants a copy of the data for their own editing. Using placeholders, we don’t have to send the book’s content for every copy. We can send it once and reuse it. Here’s how:

book =  Repo.get_by(
    Book,
    title: "The Hobbit"
  )

user_ids = [1, 2, 3]

now =
  NaiveDateTime.utc_now()
  |> NaiveDateTime.truncate(:second)

copies =
  Enum.map(users_ids, &%{
    user_id: &1,
    content: {:placeholder, :content},
    inserted_at: {:placeholder, :now},
    updated_at: {:placeholder, :now}
  })

placeholders = %{
  content: book.content,
  now: now
}

Repo.insert_all(
  Copy,
  copies,
  placeholders: placeholders
)

And that’s it! Ecto will replace our :content and :now placeholders wherever necessary. We successfully reduced the size of our database insert to a minimum.

Like what you read? Sign up for more!

Caveats

As mentioned in Ecto’s docs, placeholders have a few limitations:

  1. All fields where a placeholder is used must have the same type. For example, you can’t use the same placeholder for an integer and a text field. Keep this in mind, especially when inserting different types of DateTime data.
  2. You can’t use a placeholder nested inside a data structure. The value you provide for the placeholder must always be the entire value you want to insert. Have a look at this example:

This won’t work:

data = [  %{
    details: %{
      age: {:placeholder, :age}
    }
  },
  %{
    details: %{
      age: {:placeholder, :age}
    }
  }
]

placeholders = %{age: 21}

Repo.insert_all(
  Person,
  data,
  placeholders: placeholders
)

Instead, do this:

data = [  %{
    details: {:placeholder, :details}
  },
  %{
    details: {:placeholder, :details}
  }
]

placeholders = %{
  details: %{
    age: 21
  }
}

Repo.insert_all(
  Person,
  data,
  placeholders: placeholders
)

Conclusion

And that’s it! I hope you enjoyed this article! If you have questions or comments, let’s discuss them on BlueSky. Follow me on BlueSky or subscribe to my newsletter below if you want to get notified when I publish the next blog post. Until then, thanks for reading!

Liked this article? Sign up for more