Distributed logger system design question. How would you answer it?
Interview Experience
I was asked to design a distributed logger that would log data for multiple microservices. I was asked what would happen if the logger was pushed beyond its capacity (that is, the microservices sent i
Full Details
I was asked to design a distributed logger that would log data for multiple microservices. I was asked what would happen if the logger was pushed beyond its capacity (that is, the microservices sent it more log entries than it could handle). The microservices could be in containers on the same machine or they could be on different machines. I was asked whether the microservices should be pushing log entries or whether something should be pulling log entries. I was not sure how to answer that. Which approach is better? What are the tradeoffs? I was also asked how to correlate posts from the same microservices together. I was asked how to design a system that would allow me to search log entries that came from different microservices in the event that useful data about an issue was spread across those different microservices. My answer: The microservices would write the log entries to files on its own dedicated directory on disk. Each log entry would be assigned a unique ID like a randomly generated UUID. There would be something that picks up those files and sends them to the distributed logger, so this would be a pull system. To search log entries from the different microservices, I'd merge all the log entries together somehow. I didn't do well on the question because I couldn't go into depth on how this would work. The interviewer mentioned that Kafka and Zookeeper could be useful but I didn't know anything about them. What's a good way to answer a distributed logging design question like this?