Why Did We Need to Get Rid of Legacy Infrastructure?
To sum up;
- Legacy infrastructure no longer working properly
- Avoiding data inconsistency
- Having an easily manageable and up-to-date infrastructure
After making a detailed analysis of the legacy infrastructure, we started working closely with our DevOps team for new infrastructure. The study would offer a solution to the problems in the company’s old data infrastructure and bring up-to-date data infrastructure into use. On the other hand, data inconsistency within the application could be controlled. The very simple structure created between transactional resources and a single database (MySQL) could no longer meet the needs of the company. Data inconsistency was high as there was an irregular cron structure written in several programming languages. Even important data such as trips, customers, and payments were occasionally affected by this inconsistency. Another negative aspect of the situation is that since the drivers’ payments were based on this database, even queries could not be run at certain times of the day.
It’s Time for a Change!
“It is necessary sometimes to take one step backward to take two steps forward.”
We went back a little while working on the structure. After determining the problems one by one, we documented them and started to plan on the solution. We tried improving MySQL first. However, we knew this was not a long-term solution. At this point, the idea of designing a new data warehouse was discussed. Transactional data obtained from operational databases were transferred to the PostgreSQL, which we will use as a warehouse, with open-source software called ToroDB. Several problems we encountered in ToroDB pushed us to seek different solutions. Our solution after ToroDB was Foreign Data Wrapper. Instead of setting it up as a separate database or service on the server where the data warehouse is located, we created a new PostgreSQL on a different server and used FDW as an effective solution between the data warehouse and data sources. On a daily basis, we can schedule all data pipelines with Airflow and transfer them to our data warehouse. On the other hand, the structure, which is about to be completed for stream data, is also shown in the below diagram.