Fidelity Page on Starship | Author Martin Pihlak | Starship Technologies


Photo by Ben Davis, Instagram slovaceck_

Running autonomous robots on urban roads is a major challenge in software engineering. Some of the software runs on the robot itself but most run on the back. Factors such as remote access, route access, customized robots, ship management and customer and business connections. All of this is required to drive 24×7, without interruptions and grow vigorously to suit the task.

SRE at Starship is responsible for providing cloud architecture and platform services to support past operations. We are settled on Gentlemen of our Microservices and running on top of AWS. MongoDb It is the original storehouse of many previous works, but we also love it PostgreSQL, especially when there is a need for strong singing and marketing signals. For async messaging Kafka then a messaging platform and we use it for aesthetic purposes besides sending videos from robots. In appearance we depend Encouragement and Grafana, Loki, on the left and Jaeger. CICD is operated by Jenkins.

A good portion of SRE time is used to design and refurbish the Kubernetes architecture. Kubernetes is our main delivery platform and there is always something to be accomplished, it would be good to customize, and add confusing Pod points or improve Spot usage. Sometimes it’s like laying bricks – just putting on a Helm chart to make it work better. But often the “bricks” have to be carefully selected and monitored (by Loki who is responsible for the log, and the Service Mesh item then the same) and sometimes its functionality is not in the world and should be recorded from then on. When this happens we turn to Python and Golang as well as Rust and C when needed.

Another major tool that SRE manages is information and archive. Starship started with one MongoDb monolithic – a method that has worked so far. However, as the business grows we need to re-invent that design and start thinking about helping robots by a thousand. Apache Kafka is part of the development agenda, but we also need to know about the small architecture of natural groups. On top of this we are developing tools and adapters to take care of the equipment that is available here. Examples: increase MongoDb visibility and sidecar proxy to analyze database data, enable PITR support on pages, clear ups and frustration tests, collect Kafka-sharding metrics, enable data storage.

Finally, one of the most important goals in Site Reliability Engineering is to reduce the production time of Starship products. While SRE is sometimes called upon to address the shortage of infrastructure, the most important work is done to prevent shutdown and to ensure a speedy recovery. This could be a bigger topic, ranging from having solid K8s architecture to engineering methods and business processes. There are so many wonderful opportunities to make a living!

One day in the life of SRE

Arriving at work, sometimes between 9 and 10 (sometimes they work remotely). Hold up a cup of coffee, view Slack messages and emails. Review the nightly warnings, and see if there is anything interesting there.

Note that the MongoDb connection latency is out at night. Digging deep into the path of Prometheus and Grafana, discover that this is happening at a time when repatriation is in full swing. Why is this sudden and problematic, we have been running these recovery backups for years? It turns out that we put a lot of pressure on network storage and storage and this destroys all available CPUs. It seems that the goods in this category have grown a little bit to make this known. This is happening at a standing ovation, with no production, yet a problem, once it has failed. Add Jira item to fix this.

Along the way, modify the MongoDb (Golang) code to add more histogram buckets to better understand latency divisions. Running the Jenkins pipeline to create a new research to produce.

At 10am there is a Standup meeting, share your updates with the team to learn what others have been doing – setting up a VPN server monitoring, using Python and Prometheus software, setting up ServiceMonitors on external services, setting up MongoDb connections, running canary traffic and Flagger.

After the meeting, resume the activity that was scheduled for the day. One of the things I planned to do today was to make another part of Kafka in the practice area. We are running Kafka on Kubernetes so it should be straightforward to take existing YAML files and translate them to fit the new cluster. Or, for a second thought, should we use Helm instead, or maybe there is a better Kafka driver available here? No, don’t go there – more magic, I want to improve my appearance. Raw YAML is. After an hour and a half, one new group is on the move. Its preparation was simple; the only ones registering Kafka brokers in DNS require a configuration change. Generating these apps required a small bash script to set up accounts on Zookeeper. The only piece left, was to set up Kafka Connect to get the series update features – indicating that the test set isn’t running in ReplicaSet mode and Debezium can’t find it. Go back to this and move on.

Now is the time to prepare for the Wheel of Misfortune event. At Starship we run this to help us understand systems and share solutions. It works by breaking another part of the system (usually on trial) and having a poor person try to deal with the problem. This way I start a product test with Hey multiplying microservice in computing methods. Put this as Kubernetes’ work called “haymaker” and hide it so that it will not appear immediately in Linkerd’s service (yes, bad ?). Afterwards make a “Wheel” game and see every opportunity we have in game books, metrics, notifications etc.

A few hours ago that day, stop all distractions and try to write some more. I have also set up an Mongoproxy BSON controller with asynchronous advertising (Rust + Tokio) and I want to know how this works with real knowledge. It turns out there is something wrong with the parser pockets and I have to add more logging to find out. Find the best Tokyo library and enjoy it…

Disclaimer: The information presented here is based on fact. Not all of this happened in one day. Some meetings and communications with colleagues have been changed. We are hiring.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *