I like using Upsert
for my CreateOrUpdate
methods in my application code. It simplifies the application programming and avoids an unnecessary redundant query where I can avoid making 2 separate queries:
- first to check if the row exists or not
- second to insert it
And last week I learned that if the row already exists and your upsert query is updating the row instead, then MySQL will return the result of rows affected as 2
. I found myself being surprised by this finding and felt it important to blog about.
Let’s look at this with a simple example.
Say, we have a table users
with two columns: username
and login_count
:
CREATE TABLE users (
username varchar(32) PRIMARY KEY,
login_count INT DEFAULT 1
);
And if you were to write a query that inserts a new username with login_count = 1, or if the username already exists, then increment the current value of login_count
, you’d write the following query:
mysql> INSERT INTO users (username) VALUES ('johndoe') ON DUPLICATE KEY UPDATE login_count = login_count + 1;
Query OK, 1 row affected (0.01 sec)
Which results in a new row being inserted. We can clearly see that 1 row was affected by the query. Let’s look at the table now:
mysql> select * from users;
+----------+-------------+
| username | login_count |
+----------+-------------+
| johndoe | 1 |
+----------+-------------+
1 row in set (0.00 sec)
Looks good! We have a single row as expected. Now let’s try to run the same query again (the user logged in again):
mysql> INSERT INTO users (username) VALUES ('johndoe') ON DUPLICATE KEY UPDATE login_count = login_count + 1;
Query OK, 2 rows affected (0.00 sec)
The output suggests that 2 rows were affected. Does that mean we have 2 rows in the table now? Or perhaps we have a bug in our schema? Neither, but let’s look at the table again to be sure:
mysql> select * from users;
+----------+-------------+
| username | login_count |
+----------+-------------+
| johndoe | 2 |
+----------+-------------+
1 row in set (0.01 sec)
In fact, our query worked as expected. From the Database’s point of view, 2 rows affected makes sense – 1 row that was attempted to be inserted and then another row that was updated. But I initially found it confusing as I was writing some tests for my DB layer in Go (or else I would never have learned this cool fact!). Here’s a short example of what my test looked like:
func TestUserStore_Upsert(t *testing.T) {
// Sets up a test database and applies the migrations. Out of scope for this blog post.
db, err := setupDB(t)
if err != nil {
t.Fatal("failed to setup test database", err)
}
store := UserStore{db: db}
if err = store.Upsert("johnwick"); err != nil {
t.Fatal("failed to insert", err)
}
err = store.Upsert("johnwick")
if err != nil {
t.Fatal("failed to update", err)
}
}
And it failed with:
$ go test
--- FAIL: TestUserStore_Upsert (0.04s)
main_test.go:31: failed to update unexpected rowsAffected: 2 for upsert of username: "johnwick"
FAIL
exit status 1
FAIL github.com/indradhanush/mysql-rows-affected 0.279s
The failing code looked like this:
if rowsAffected != 1 {
return fmt.Errorf("unexpected rowsAffected: %d for upsert of username: %q", rowsAffected, username)
}
return nil
This was the only check in my code. As a result of this failure, I then updated my code to look somewhat different:
if rowsAffected == 1 || rowsAffected == 2 {
return nil
}
return fmt.Errorf("unexpected rowsAffected: %d for upsert of username: %q", rowsAffected, username)
And then my test passed!
$ go test -v
=== RUN TestUserStore_Upsert
--- PASS: TestUserStore_Upsert (0.04s)
PASS
ok github.com/indradhanush/mysql-rows-affected 0.283s
Morale of the story? Write tests! 😛
If you’d like to check out the code and play around with it yourself you can find it here. And if you liked this blog post, I won’t mind you adding a star on the GitHub repo. 😁
Edit: I went looking for the docs after I published this post today, and found out that this is a well-documented and expected behaviour:
With ON DUPLICATE KEY UPDATE, the affected-rows value per row is 1 if the row is inserted as a new row, 2 if an existing row is updated, and 0 if an existing row is set to its current values.
While, Postgres on the other hand will return 1 rows affected as the result. While I still find it counter-intuitive, my best guess as to why MySQL chose to do this way is that the authors possibly wanted to provide an API to the consumer to distinguish between an INSERT
and UPDATE
operation.
P.S.: If you made it this far - thank you for reading but another little tidbit is that you can also run select row_count()
immediately after a query to get this number. But if you run it again the result will be -1
, since it’s updated after each query. And we’ve just run a new query in itself with select row_count()
which affects no rows at all.
P.P.S: Technically it’s a Last-Week-I-Learned but I didn’t have the time to blog about it then.
Addendum: Bulk inserts
When performing an UPSERT on a single row, the output is still unambiguous. We can tell whether the row was inserted (1), updated (2), or unchanged (0). But this gets very confusing for bulk upserts. If you’ve been following along, our table currently looks like:
mysql> select * from users;
+----------+-------------+
| username | login_count |
+----------+-------------+
| johndoe | 2 |
+----------+-------------+
1 row in set (0.01 sec)
Say we wanted to UPSERT a few more users – spiderman, batman, wonderwoman, catwoman and superman while also updating the login count for existing users. Let’s execute the following query:
mysql> INSERT into
users (username)
VALUES
("johnwick"),
("spiderman"),
("batman"),
("wonderwoman"),
("catwoman"),
("superman") on duplicate key
UPDATE login_count = login_count + 1;
Query OK, 7 rows affected (0.00 sec)
Records: 6 Duplicates: 1 Warnings: 0
The mysql shell is correctly identifying the number of existing rows here. But the rows affected remains more than the total number of rows!
mysql> select row_count();
+-------------+
| row_count() |
+-------------+
| 7 |
+-------------+
1 row in set (0.01 sec)
And one last time, if we run the same query again you’ll now see an even more inflated number of rows affected (everything is a duplicate, so each row is counted twice, once for insertion and once for updating it)!
// Same query as above, skipped here for ease of reading.
Query OK, 12 rows affected (0.00 sec)
Records: 6 Duplicates: 6 Warnings: 0
Running the query to see the affected rows returns 12:
mysql> select row_count();
+-------------+
| row_count() |
+-------------+
| 12 |
+-------------+
1 row in set (0.00 sec)
The MySQL shell reporting the total records, duplicates and warnings might be a mysql specific feature. As I did not see this additional information with SequelAce, which only reports the number of rows affected in the GUI. If you know why this is the case, please do let me know!
Postgres
Postgres remains consistent and only returns the total number of rows inserted + updated
each time independent of how many rows were already in the DB. For example:
(The query to return number of rows affected is slightly more involved)
dhanush@/tmp:localdb> WITH
affected_rows as (
INSERT into
users (username)
VALUES
('johnwick'),
('spiderman'),
('batman'),
('wonderwoman'),
('catwoman'),
('superman') ON CONFLICT (username) do
UPDATE
SET
login_count = users.login_count + 1 RETURNING *
)
SELECT
COUNT(*)
FROM
affected_rows;
+-------+
| count |
|-------|
| 6 |
+-------+
SELECT 1
Time: 0.006s
Conclusion
If you’re using the rows affected to determine how many rows were affected and
need this information in your business logic for whatever reason, please do not
rely on the output of select row_count()
in MySQL. Instead, possibly have
additional application logic to detect the changes.
Acknowledgments
Thank you to Swanand for encouraging me to write about the bulk upsert use case and for reviewing the post and suggesting edits.