How We Built An In-House Price Alerts System9 min read
Imagine you are a beginner at investing in stock markets. Your friend has told you to buy Reliance stock when it hits 1800. However, you find it difficult to monitor the markets everyday because you have a full-time job and other responsibilities in your life. Or you are at an intermediate level and ended up beating yourself up because you got caught up for a couple of days and missed a rally or missed a massive dip in your stock and will now have to take your money out at a loss.
A limitation of time and resources at your hand should not mean that you miss out on making important decisions in the markets. You wish for a mechanism that would prompt you when the price of a certain stock or your favorite index hits a certain value.
The price alerts feature on Paytm Money does exactly that. You set a price alert on our platform and we send you a push and an email notification to help you make informed decisions. We further extended this feature to support GTT(Good Till Triggered) orders; which means you place an order to buy or sell a stock at a certain price and we execute that order on your behalf whenever that condition is met.
In this blog post, we will give an overview of the price alerts system at Paytm Money and the intricacies involved while designing and implementing the feature.
The problem statement was simple. A user wants to set an alert and get notified when the price of a stock hits a certain target.
We would like to introduce a couple of definitions to make it easier to understand their references in this blog post.
Scrip: Scrip is an interchangeable word for a financial instrument. In this post a scrip can refer to a stock, ETF, Index or any other financial security.
Tick: A price change, positive or negative, that can happen for a scrip. In this post a tick refers to a price point at which a trade was executed for a scrip.
We had to take a few important decisions which helped shape the architecture of the price alerts system.
- Manage Resources and Latency
- Scalability and Load Distribution
Manage Resources and Latency
The utility of the feature depends on being as real time as possible. However, we also had to be cost-effective. This led us to think in the following directions.
Manage Ticks: There is no upper bound to the number of ticks we can get each day. At present, we get around 18-20 million ticks every day. To consume each and every market tick and query our datastore for price match and trigger alerts would have been an expensive task. We decided to significantly reduce the actual number of ticks we query our data store for without missing any alert that needs to be triggered. This prompted us to logically compress the ticks at two places in our system. We will discuss how we do that and still not miss any user alerts that need to be triggered in the later parts of this blog post.
Manage User Alerts: We did not want to flood the system with too many alerts since that would lead to an unbounded data set that needs to be queried for each tick we receive from the market. Hence, we limited the number of alerts per user to 50. This way the number of alerts would grow linearly with the addition of new users on the platform and we could easily decide on gradually scaling up our database clusters if need be.
In addition to the above, we considered setting start and stop timings for servers (feed, web service and tick listener servers in the architecture diagram) and writing cleanup jobs to clean triggered and expired alerts.
Scalability and Load Distribution: Originally, the feature was supposed to support only stocks. We knew that the types of instruments we’ll need to support will gradually increase. For example, we went from supporting around 6000 stocks to close to 100000 scrips now. This encouraged us to design the system in a way that it is scalable without needing any major modifications.
At a high level, we concluded that we had to build three components to implement the feature.
- A component that would allow the user to perform CRUD operations for price alerts
- A component that would subscribe to live market feeds and push messages to queues
- A component that would consume messages from the queues and trigger alerts
The idea was to make sure that the ticks for a specific scrip go to a specific queue and each alert generator app server will always consume data from its designated queue. We made use of kubernetes statefulsets here. This was done to distribute the load properly across the app servers and also to handle the concurrency issues that would come if multiple servers could consume messages for the same scrip.
This also meant that the subscription component could be independently scaled. And, if the generator system was not able to handle the load, we would simply introduce a new queue and a new pod to the system. The code was structured in a way to make this as a minimal config change.
Web App: This app hosts CRUD APIs which support user functions like creating, reading, updating and deleting price alerts.
Tick Listener: This process establishes websocket connections with the feed servers and subscribes for ticks. Feed servers are servers hosted on AWS which subscribe to live exchange feeds. We subscribe for scrips in a sharded fashion with a limit on the number of scrips per connection, say 500 scrips per websocket connection. We receive each tick in the form of binary packets.
As and when we receive ticks for a scrip, we parse and segregate them into internal queues based on the scrip type. A logical compression of price is done for the stocks; say in a particular time interval we received price change for a stock in this order 101 104 103 99. After logical compression, these packets would be passed on as low and high values of [ 99 104 ]. This helps us reduce the number of packets we would be processing in our application. Thereafter, this low and high value of a stock (in json format) is passed onto SQS for alert / notification process initiation.
Price Alert Generator: This process is responsible for reading messages from the SQS endpoints. After reading the messages, it parses them, finds all alerts that need to be triggered in batches, sends out payload to the communication module to trigger email and push notifications, marks the alerts as triggered in the database and further deletes messages from SQS. For GTT alerts, instead of sending payload to the communication module we send downstream messages to SQS which are consumed by our sister team to trigger user orders.
Some intricacies we had to take care of in this component:
Async Flow: The process of alert generation is separated out from the process of reading and parsing the SQS messages. We read messages from SQS and save them in an in-memory queue. We read messages from this queue, make buckets out of them for each unique scrip and submit a different task for each bucket.
Inside each task, we do another level of compression. We find the global high and low values we have received and query only for those values. This saves us from querying our data store for each message consumed from SQS.
Before going into other intricacies, we would like to mention that we use Elasticsearch as the persistence layer for price alerts. There were a couple of challenges that came up in the persistence layer while initial implementation discussions.
Building the query and optimizing it: Triggering alerts when we receive a tick matching the trigger price for an alert was the easy part. However, it was possible that there was no tick received from exchange for the trigger price the user set. But that does not mean we should not trigger the alert if the price is breached.
An example would help clear things up. Suppose the current ltp (last traded price) of a stock is 100 Rs. The user set an alert for 95 Rs. Now, the next morning the market opens at 92 Rs. Although we never received a tick for 95 we should still trigger the alert. A similar case can happen on the higher end of the spectrum as well. Suppose the user set an alert for 102 Rs but it went straight from 101 to 105. We should trigger the alert in this case as well.
The mapping of the document in Elasticsearch and the query was defined in a way that we are able to support all the cases.
In addition to this we used search after API of Elasticsearch to fetch alerts to be triggered in batches. This is a much better approach in contrast to pagination and scroll.
Concurrency: There were potentially two challenges with respect to concurrency in the persistence layer.
- What if one of the app servers which processes the ticks and triggers the alerts is querying Elasticsearch and in the meantime a new tick of the same price range comes in and another server consumes it. There could be a race condition in Elasticsearch and the same alert could be triggered twice. The first level of concurrency issues were handled by making sure that each pod would read messages from its designated queue only. And the messages for a scrip will always be written to the same queue. This was solved with the points mentioned under Scalability and Load Distribution.
- Elasticsearch is near real-time which means it can take upto ~1 sec for all shards to be in a consistent state. That would mean we could potentially trigger the same alert N number of times if we keep receiving the same range of ticks. We could have used force refresh in the query while marking the alerts as triggered but that would have led to additional strain on the infra and added latency. To alleviate this we maintain an in-memory distributed cache with TTL, so that we don’t trigger the same alert twice from the same app server. And, on the Elasticsearch side, we set the “proceed on conflicts” property to true so as to bypass concurrency issues.
At-least-once delivery: SQS by design guarantees at-least-once delivery. It means that the client can receive the same message more than once. In this case the in-memory cache prevents us from re-triggering the same alert. Also, there could be a case that we sent the payload to the communication module but before we could mark the alert as triggered in Elasticsearch and delete messages from SQS, our app server restarted. In this case potentially we can trigger the alert again. We accepted this case as part of the design that in a rare case a user can receive a notification multiple times.
We are evaluating opportunities to integrate more types of alerts in the system in the near future and increase its utility.
Thanks for reaching this far to the end. We encourage you to try out the feature for yourself. You can find it under the company page of a particular scrip. Please see attached image for reference.