The mission of Coinbase is to create an open financial system for the world. The mission of the Coinbase Payments team is to empower customers to move money in and out of the crypto economy with a delightful and flawless experience. Coinbase currently supports 10+ different payment methods in over 30 countries and we are building more. In this blog, we will share some of the main challenges and best practices for payment systems from an engineering perspective.
Accuracy
Payments are one of the areas in which there is zero tolerance for any errors. Ensuring product flow and features is extremely important. Any payment bugs related to accuracy will cause an unacceptable customer experience. When an error occurs it needs to be corrected immediately. Furthermore, the process of removing such mistakes is time-consuming, and usually complicated due to various legal and compliance constraints.
In our system, we have built multiple levels to ensure accuracy. These span from unit testing in implementation, production testing / bug bash to any feature updates or flow changes, monitoring at various error rates, authorization rates and success rates; To detect the anomaly and set up the warning to capture anything that can go wrong as regression due to new changes. Close support with the product loop also helps to surface any purity related issues.
In addition to logical correctness, the accuracy of system behavior can also be extended to how exceptions are handled. We discuss some of these concepts in the following sections.
Resiliency
The other important aspect of accuracy is how resilient the system is to external issues and bugs. For example, one of the most important concepts in the payment domain is called idempotency. This is necessary because if any unsuccessful transaction initiates a retry, we should ensure that the recovery does not result in any double charge.
Typically, an end-to-end payment system is client-side, for backend services, to external partners, where payment transactions are handled at the backside. All transactions should be kept atomic as much as possible. But some customer-to-service or internal-to-external requests may be prolonged, especially in cases of timeout or failure, and we can only confirm the final result (success / failure) minutes or hours later. So in some of those cases we will start the retry from upstream to downstream. If the entire end-to-end retransition is not handling properly, that is, the system is damaged, it is unavoidable to go into the position of processing the same transaction twice, thus causing double charge or double payment.
Once the idempotent quality is ensured, we must also ensure that the correct design for auto-retry and user messaging etc. is done.
Recovery and traceability
Another important thing to consider when there are multiple layers from upstream to downstream is the data record. That is, how do we design a data model, data recording, and propagation to ensure that a problem arises, then we can do our best to fix the system state and find out what happened.
Payments always use both cache data for speed and continuous data for recovery. Whenever caching occurs, it is important to have the right strategy in place to guide which data layer to write to when. I.e. how we propagate data when transient disagreements occur, how to identify the source of truth, and how we design the entire recovery process to ensure ultimate consistency.
Another key to capturing data properly is to keep a reliable record such that we can always find out what exactly happened. It is needed in various contexts including financial audit, event logging, issuing the need for investigation, etc.
Availability and integration velocity
When it comes to customer experience, the first thing users need to do is whether the service is available for their use. But the technical stack of a payment system consists of several layers. So we try to add as much redundancy as possible by duplication of critical components to increase the reliability of our system.
Another important aspect of an international payment system is geographic coverage. The speed with which we can add new payment methods to new jurisdictions is important. To speed up the speed of integration, it is important to have the right abstraction and abstraction layers to capture, but also hide specific details. For example, a well-designed abstraction occurs when it can handle both push payments and pull payments; To be used to represent both payment and payment; Charge and refund; Sync payment and async payment, etc.
System Maintenance and Scalability
Keeping payment systems maintainable and scalable is extremely important. The kissing principle states "Wherever possible, complexity must be avoided in a system — as simplicity guarantees the greatest level of user acceptance and interaction." This principle is particularly important when it comes to payment system design. Any more complex logic or critical code can cause mysterious bugs in the future.
We also lean towards maintaining high quality runbooks and documents to capture all design considerations and tradeoffs. In our experience, a single design choice may become a matter of debate in the future and for this reason, documentation is invaluable. Most of the design patterns in our system are dependent and interact with each other. Each of these components are important to complete the system. Having complete documentation helps newcomers understand, ramp, and align overall design methods.
Above and beyond
Although building accurate reliable payment systems is important, we should also look beyond. Empowering customers to transfer money with a delightful experience is more than just making transactions secure and correct. End-to-end payment systems are complex and need to include compliance, security, fraud, and other factors. This blog only touches some basic and high level concepts. However in future we will share more articles discussing in-depth components of our payment systems.
Crypto Support Desk We are a team of support providers that can solve the technical issues that are affecting your cryptocurrency trading. Our support team can use a reliable troubleshooting process to address and troubleshoot your technical issues.
ReplyDelete