Skip to main content

Why do duplicate data records exist?

· 5 min read
Nitesh

I worked on a merchant/shopkeeper application last year that allowed them to manage all aspects of their business—including inventory, orders, payments, and credit tracking through our application. One feature allows a merchant to create an order for a consumer and issue a bill against it, allowing the customer to use that bill to make purchases in a store. The merchant reported an issue involving repeated orders/bills on the order creation page. Let's examine my strategy to solve this problem.

Problem

The issue is that sometimes when a merchant creates an order or bill for a customer, a duplicate order or bill is created.


GIF to show how Duplicate data created

In the GIF below, we can see that the client pressed the submit button to create a bill for order, but since the customer did not receive a response, he again clicked the button, resulting in duplicate data.

idempotent_miss

Brainstorm

How and what data do I gather?

  • first i check frontend part, network tab and found that in normal cases when user clicks on create bill button it returns a bill id, but in abnormal cases there is a timeout or no response from backend api.
  • but actually when i check in abnormal cases data is stored in the database but response did not reach to the client side.
  • hence user again clicks on submit button which leads to data duplicacy.

Diagram shows the normal and abnormal case

current_scenario

Solution

Let's see what we need to do in order to solve this issue

  • so basically we need my API to be idempotent, but wait what is idempotent let's see:

"Idempotence, in programming and mathematics, is a property of some operations such that no matter how many times you execute them, you achieve the same result"

  • so basically idempotent means same operation if we performed twice does't create a duplicate entry we need any mechanism to catch the duplicate operation and handle the side effect.

what are idempotent and non-idempotent operations?

it is simple so POST and PATCH Request is responsible to change the state of your data. so hence these 2 are idempotent operations.

but on the other hand Rest all are non-idempotent operations.because lets take GET Request it will not change the data or lets take DELETE operation where once you delete the object will not deleted twice.

Hence we only need to handle the operations which are idempotent.

so how i make my POST API idempotent?

  • first we need to create a unique id for each unique request and that id will known as request_id
  • so now your payload will look something like this
{
"request_id": "14836563416", // uuid
"order_summary": {
"items": [
{ "name": "strawberry cookie", "count": 4 },
{ "name": "Chocklate cake", "count": 3 }
]
}
}
  • now before storing the data in the database our API first create a hash of the incoming payload using sha-256 hashing algorithm
const idempotent_key = crypto
.createHmac("sha256", Securitykey)
.update(payload)
.digest("hex"); // idempotent key is our hashing key
  • and then check whether this hash-key already exists or not.if not store the hash-key in persistent storage.
  • now as we know that for validation we need persistent storage or database, so now the question is which database to use.
  • and we decide to choose the NOSQL DB because we need these properties:
    • horizontally scalable nature
    • key-value structure or no data schema restriction
    • no relation in tables is there

Below is the simple schema for persistent store

schema has 3 things:

  • idempotent_key: "<hash_key>"
  • expiry_time: "<time_to_live>" // after what time we have to delete this entry
  • response_summary: "<response_summary>" // summary which we have to return to client on success

solution_idempotent_store_img

  • now if we already have that same idempotent_key extract the response from the persistent store and return the successful response to the client.
  • but if idempotent_key is not present in persistent store, then store the key , create a bill for the order and store the successful resposne in persistent store.
{
"success": true,
"bill_id": "1202"
}

Below is the final Sequence diagram

solution_diagram

Reliability Talk

it is important that your system should be fault taulerant and reliable which means what if your system fails

failure situations can be
  • what if we created a idempotent key, stored it into persistent store and system fails before bill creation.
  • so now bill is not created but idempotent_key/hash_key is there which resist us to store the data.
  • hence use Transactions(ACID properties) between 2 operations:
    • first is storing key in persistent store
    • second is creating order
  • if any one fails so complete transaction should rollback

you can read about Transactions here → Transaction on Wiki