Optimize Ecto's insert_all with Placeholders
Before we start talking about using placeholders in Ecto’s insert_all/3
, I need you to understand the following:
What do you call cheese that is not yours?
Nacho Cheese
Now, with the ceremonial dad joke out of the way, let’s focus on today’s topic: What are Ecto placeholders and how to use them when inserting a lot of data?
The Problem
Imagine we want to insert a lot of book data into our database. We decide to use Repo.insert_all/3
and write a script like the following:
books = [ %{
title: "The Hobbit",
content: "In a hole..."
},
%{
title: "The Little Prince",
content: "Once when I was..."
}
]
now =
NaiveDateTime.utc_now()
|> NaiveDateTime.truncate(:second)
data =
Enum.map(books,
&Map.merge(&1, %{
inserted_at: now,
updated_at: now
})
)
Repo.insert_all(Book, data)
All looks well here. We knew that Repo.insert_all/3
doesn’t add the autogenerated timestamps and added them ourselves. We run the script and receive an output like this:
INSERT INTO "books"(
"title", "content",
"inserted_at","updated_at"
)
VALUES
($1,$2,$3,$4),($5,$6,$7,$8)
[
"The Hobbit",
"In a hole...",
~N[2022-10-02 10:36:34],
~N[2022-10-02 10:36:34],
"The Little Prince",
"Once when I was...",
~N[2022-10-02 10:36:34]
~N[2022-10-02 10:36:34],
]
Okay, this looks good as well. But have a closer look at the values and you might start to wonder: Why do we send the same timestamps for every entry, twice?
This seems unnecessary, doesn’t it? Especially since all this duplicate data is sent to our database over the network, causing unnecessary traffic, bloating up the payload, and slowing down the entire request.
If you now ask yourself: How can we send the timestamps only once? I have good news for you. Ecto’s placeholders got your back.
The Solution
Ecto allows us to specify placeholders
in our Repo.insert_all/3
request for fields that should all receive the same value. In this case, we want to insert the same timestamp in every inserted_at
and updated_at
field. But we want to send the timestamp to the database only once. We can achieve this by moving the timestamp to a placeholder like this:
books = [...]
now =
NaiveDateTime.utc_now()
|> NaiveDateTime.truncate(:second)
data =
Enum.map(books,
&Map.merge(&1, %{
inserted_at: {:placeholder, :now},
updated_at: {:placeholder, :now}
})
)
placeholders = %{now: now}
Repo.insert_all(
Book,
data,
placeholders: placeholders
)
We replaced the timestamps with {:placeholder, :now}
and provided the value for :now
in the placeholders
option. If we run this query, the output shows that Postgres inserts the timestamp wherever we used the :now
placeholder. Have a look at the output:
INSERT INTO "books"(
"title", "content",
"inserted_at","updated_at"
)
VALUES
($2,$3,$1,$1),($4,$5,$1,$1)
[
~N[2022-10-02 10:46:04],
"The Hobbit",
"In a hole...,
"The Little Prince",
"Once when I was..."
]
Ecto replaced all occurrences of our :now
placeholder with a link to the first value $1
, which is our timestamp. Instead of sending the same timestamp a thousand times, we only send it once and use it wherever needed. So much bandwidth saved!
Heads up: Placeholders are only supported in Postgres, and not in MySQL.
Placeholders are particularly useful when we insert a large chunk of data into multiple fields or across multiple entries. For example, imagine that we want to duplicate a book’s content for multiple users. Every user wants a copy of the data for their own editing. Using placeholders, we don’t have to send the book’s content for every copy. We can send it once and reuse it. Here’s how:
book = Repo.get_by(
Book,
title: "The Hobbit"
)
user_ids = [1, 2, 3]
now =
NaiveDateTime.utc_now()
|> NaiveDateTime.truncate(:second)
copies =
Enum.map(users_ids, &%{
user_id: &1,
content: {:placeholder, :content},
inserted_at: {:placeholder, :now},
updated_at: {:placeholder, :now}
})
placeholders = %{
content: book.content,
now: now
}
Repo.insert_all(
Copy,
copies,
placeholders: placeholders
)
And that’s it! Ecto will replace our :content
and :now
placeholders wherever necessary. We successfully reduced the size of our database insert to a minimum.
Caveats
As mentioned in Ecto’s docs, placeholders have a few limitations:
- All fields where a placeholder is used must have the same type. For example, you can’t use the same placeholder for an
integer
and atext
field. Keep this in mind, especially when inserting different types of DateTime data. - You can’t use a placeholder nested inside a data structure. The value you provide for the placeholder must always be the entire value you want to insert. Have a look at this example:
This won’t work:
data = [ %{
details: %{
age: {:placeholder, :age}
}
},
%{
details: %{
age: {:placeholder, :age}
}
}
]
placeholders = %{age: 21}
Repo.insert_all(
Person,
data,
placeholders: placeholders
)
Instead, do this:
data = [ %{
details: {:placeholder, :details}
},
%{
details: {:placeholder, :details}
}
]
placeholders = %{
details: %{
age: 21
}
}
Repo.insert_all(
Person,
data,
placeholders: placeholders
)
Conclusion
And that’s it! I hope you enjoyed this article! If you have questions or comments, let’s discuss them on BlueSky. Follow me on BlueSky or subscribe to my newsletter below if you want to get notified when I publish the next blog post. Until then, thanks for reading!