Employees of the private space company SpaceX held a Q&A session for everyone in their community on the Reddit website. The developers told how their work differs from the usual IT routine and what difficulties they face when creating software for equipment that must fly outside the atmosphere, with people on board.
A few days ago, SpaceX on Twitter announced an AMA ("ask me anything") event with the participation of its programmers on the vastness of Reddit. In the official community of the company on the largest English-speaking Internet forum, its employees satisfied the curiosity of users in a variety of topics - from the development of user interfaces for the Crew Dragon and Starship ships to the operating conditions and software testing of Starlink satellites.
Jarrett Farnitano, Dragon Ship Software Specialist, Kristine Huang, Head of the Starlink Application Software Group, Jeanette Miranda, Low-Level Laser Communications Software Developer, Jeanette Miranda, Head of the Starlink Software Group, answered questions. Starship Asher Dunn and Natalie Morris, Satellite Test System Team Leader.
In addition to satisfying the curiosity of geeks and astronautics fans, SpaceX employees, of course, encouraged everyone to respond to the company's vacancies. With the necessary skills, of course. One thing is good - experience in the rocket and space industry is not required. A complete collection of all questions and answers is available here, and the discussion thread itself with questions, answers and clarifications is located here. There are tons of technical details to follow through these links, and we'll highlight the most notable details.
Development of software for astronautics in general
The most obvious question asked at SpaceX is how software development for rockets and spacecraft differs from similar work in a "regular" IT company. For Jeanette, the main surprise was two things. First, there are few general differences. And secondly, when writing software for satellites, you very often run into the limitations dictated by physics and mathematics. When you write code for applications running on Earth, these problems almost never arise.
And, of course, the topic of fault tolerance and security came up almost immediately. After all, when writing code, errors always occur and it takes years to eliminate them even in commercial and important programs (and some remain forever). And it's not just the laziness of programmers, but the desire of customers to save money on testing. It's just that the cost of these mistakes is extremely rarely really high.
Another thing is technology at the limit of the capabilities of human technologies, which, due to one bug, can explode and send millions of dollars or even people's lives to the scrap. According to Jarrett, developing software for security-critical systems is radically different from writing code in cases where failure is acceptable and not catastrophic. Moreover, the difference lies, first of all, in the thinking of the programmer, to which you need to quickly get used to.
To minimize the possibility of failures at the software infrastructure level, a special approach is taken.All software components are created small and separate - with clear boundaries of use and "responsibility". This allows each piece of code to be tested under a variety of conditions before being deployed into systems. Which, in turn, have already been tested in their entirety many times. The main task is to make sure that the behavior of the software in all possible situations is predictable, understandable and familiar to the developer.
In important tasks like a rocket flight, freezing and rebooting is unacceptable for a computer. Among the many modular components, there are necessarily protective ones that control the behavior of the entire system on the fly. If things don't go according to plan, they have clear instructions on how to play the situation. Sometimes it's just resetting a task and moving on to the next, sometimes it's a complex strategy aimed at compensating for the problem in various ways. Well, and, of course, multiple redundancy, without this there is nowhere in rocket and space technology, and software is no exception.
In addition, test and control automation is widely used. Due to the large volume of telemetry data from each component of the software system, it is possible to track the parameters of all its components. This helps not only during the development of software for missiles or ships, but also in the operation of the huge "constellation" of Starlink satellites. And it grows every month and will become several times larger in a couple of years. Natalie adds that automation eliminates the need for hundreds of operators to control the orbital constellation.
She also explained that at SpaceX, development and testing are not separate. Every programmer necessarily participates in finding problems with what he has written, checks other people's work and helps to solve the discovered shortcomings. Moreover, this approach is applied at all stages: not only during the revision of the source code, but also during testing on models, stands, prototypes, and also together with hardware engineers, when the software is adjusted to the hardware. In addition to specialists developing components or entire systems for missiles, satellites and ships, the company has separate working groups that create tools for testing solutions.
Simulations, tests, stands, laboratories
According to Natalie, programmers working with Starlink have the opportunity to conduct final testing right in orbit. There are many satellites, they have developed systems of protection against failures, so that some devices can be used as "canaries". They are "flooded" with new software, watch how everything works and interacts with the rest of the constellation, and then deploy the update to the rest of the devices. If something goes wrong during testing, the satellite is sent a command to reboot and it simply restores the previous "firmware".
With ships and rockets, this approach, for obvious reasons, is inapplicable, so you have to test all nodes and components many times on Earth. SpaceX has created several hardware and software systems for this task. At the first stage, each programmer runs his code through simple simulations right on the work computer. If no errors are found, the program is sent to a powerful computing cluster. There it is tested first in a more detailed simulation by itself, and then as part of the entire system, of which it is a part.
Finally, when such checks are completed successfully and all the shortcomings have been eliminated, the time comes for more serious stands, which include real hardware. These can be both separate elements of avionics (aircraft electronics), and full-fledged units or assemblies of a real rocket and spacecraft. At the final stage, the entire flight mission is simulated. Natalie cites the example of Dragon, for which this process takes tens of hours or even days - you need to make sure that all stages of the mission go smoothly.Only after all of the above, the result of the programmers' work (as well as the "iron" engineers) can go to the SpaceX product that is on the launch pad.
Jeanette has added to this her experience in the development of "space lasers". Since the task of organizing reliable optical communications between satellites is highly complex, she participated in several meetings between different departments. Moreover, these meetings consisted not only of banal gatherings at the conference room table. Among them were brainstorming sessions via video communication, and meetings of programmers with engineers right in the laboratory. Together, they literally assembled and disassembled the laser communication unit numerous times, trying to figure out how to make it better.
From the very first photographs of the manned version of this spacecraft, claims for the excessive "glamor" of the interior fell in the direction of SpaceX. Recall that the company's engineers have almost completely abandoned the "traditional" buttons, devices and levers, placing three huge touch panels in the capsule with a small number of the most important buttons under them. For checklists and additional information, several Apple iPads were added to the equipment kit (a common practice in aviation). According to many commentators, this approach places design at the forefront over reliability and resiliency.
Talking about the Crew Dragon user interface, Jarrett explains that first, the ship is completely autonomous and can perform all operations in flight on its own. This does not require any action from the team. And secondly, even such a "glamorous" system in SpaceX was made as fault-tolerant as possible. If there is a problem with any display, the remaining ones compensate for its functions. And if everything goes out of order at once, the astronauts will have hardware buttons to which key functions are displayed.
While creating the software interface for touchscreen displays, SpaceX specialists pretty much wondered how to make it comfortable and functional for people in all flight modes. For example, the most popular navigation-related touch buttons are located at the bottom of the screen. Because when overloaded, astronauts will find it difficult to raise their hands up. And the most important functions are separated by large spaces and taken out of the frequently used areas of the display, where accidental pressing is possible. That is, if a person pressed them, he probably did it meaningfully and with a specific purpose.
Special attention was paid to testing the usability of touchscreen displays in conditions of strong vibrations. Moreover, it is not only about control (clicking on them), but also about the perception of information. To check the last moment, a series of experiments was carried out in which the test subjects had to play a specially created game using an Xbox controller. Their success depended on how quickly and accurately people read the shapes, sizes and colors of shapes, as well as text from the screen. In this case, the entire test bench was on a vibrating plate simulating the conditions at the start of the rocket. At this point, Jarett joked - "what we have learned for sure is the complete unrealism of any interfaces of space technology in science fiction films."
Well, an important element of convenience for astronauts has become the approach of SpaceX developers, which they call "quiet-dark" (something like "quiet-dark"). While the spacecraft is operating normally, the interface looks minimalist, showing the key parameters of all systems. But if the automation detects any deviations from the norm, problem indicators are highlighted in a special way, emphasizing the most important information. Finally, improvements are regularly made to the interface based on the real-world experience of astronauts aboard the Crew Dragon.
Escher, who is responsible for software development for the most ambitious project of Elon Musk, was bombarded with questions.According to the programmer, the approach to coding a stainless steel spacecraft is not much different from what SpaceX engineers do in Boca Chica. That is, I did it, checked it, launched it (blew it up), collected data, analyzed it, returned to the beginning of the cycle. Unlike the Falcon 9 and Dragon teams, Escher and his colleagues have more leeway. For example, they can implement some kind of software enhancement hours before the Starship test. If you are sure that this will help, of course.
Software and hardware updates for this spacecraft are ongoing. Although Starship uses a lot of equipment and code that was tested on the Falcon 9, many new systems have to be created for it. The work goes on continuously and together with the engineers involved in the "hardware". Most importantly, each new test (not necessarily even a flight) brings a lot of information, specialists are constantly learning. And they can immediately introduce completely new ideas, quickly receiving an answer - whether it works or not.
Technologies you don't expect in the rocket and space industry
In addition to touch screens in a spacecraft, SpaceX uses many equally controversial technologies that are not visible from the outside. A very curious question from a technical point of view was asked about the abandonment of real-time operating systems (RTOS). To put it simply, the software running in it always and in any conditions reacts to events no longer than a strictly defined time. If the task does not have time to be completed, the system resets it so as not to take up resources. In the aviation, rocketry, and other industries focused on reliability, reliability and safety, RTOS is considered the standard.
And SpaceX programmers wrote all the software for their rockets based on a specialized Linux distribution. Yes, there are extensions to this family of operating systems that provide the functionality necessary for working in real time. However, this is a half-hearted solution and a full-fledged RTOS does not work. Why so is a controversial question and can be discussed for a long time. Most likely, the matter is in the limited functionality of RTOS and the complexity of software development for them. Jarrett described how the company's programmers solved the real-time problem.
Most importantly, SpaceX builds almost all software on its own. As a result, the components of all systems are tested in any possible way and the developers know exactly how they will behave in different conditions. That is, the execution time of each operation is not limited from above, but simply known in advance. If something goes wrong, the protective components are triggered. It looks similar, but the difference is in the control mechanisms - in RTOS they are built into the core, and in SpaceX's solution they are implemented in separate programs. As shown by many years of experience of successful flights, it works reliably.
A separate surprise is the widespread use of web interfaces. According to Escher, they are used to make everything that astronauts see on the screens in the Crew Dragon capsule and most of the information panels for the operators of the mission control center. The completely non-space technology has proven to be incredibly reliable, fast and, importantly, flexible for such purposes. Naturally, Starship will also receive control displays on which the web interface will be displayed, only it will be significantly modified, compared to what is used in the "Dragon". The tasks of the ships are still different.
A little humor and omissions
Of course, SpaceX employees could not disclose some information. In particular, when answering a question about the transfer of technical data between Starlink satellites and a ground station, Christine gave the most streamlined answer. Despite the fact that she was asked about the amount of information and the number of streams, she only said that there is a lot of data and it is presented in a proprietary format. Also, Escher either forgot or simply did not show the current version of the Crew Dragon user interface.
Not without a bit of humor.One of the curious Reddit users passed on to Escher a question from his eight-year-old daughter - "What's the most difficult thing about building a rocket engine?" To this he replied that the work of the rocket engine is a constant explosion, and the most difficult thing is to make the engine so that this explosion remains inside. Quite logically, when talking about working conditions, the question of food came up, or rather the availability of snacks in the office. To which there was a simple answer - "omnomnom". Obviously no problem.
Well, we have not forgotten about entertainment. When the guys at SpaceX were asked how long they spend playing the popular game Kerbal Space Program (a space program simulator with a good physics model), they replied "INT_MAX". Considering that this is a parameter from the header file of the general-purpose standard library written in the C programming language, this reaction can be understood in two ways. Either a lot, since its value can exceed two billion, or that the time in the game has to be strictly limited (INT_MAX is a limiter).