Engineering

Sven Erik Knop | Lukas Dehling,

Developing together with Git

The functional limitations of the version management software 'Git' prevent it from being used in a professional environment. This is now a thing of the past: software add-ons make the popular open source tool suitable for corporate use.

© git-scm.com; Fotolia/nd3000 / Computer&AUTOMATION

Digitalization has undoubtedly ushered in a golden age for software developers. Megatrends such as Industry 4.0 mean that software is becoming increasingly ubiquitous on factory floors: machines need to be networked, communication protocols developed and PLC controls adapted. The demand for competent developers is therefore higher than ever - and it is not uncommon for them to confront their potential employers with conditions when they are first hired, such as being allowed to work with the development tool of their choice in their future day-to-day work. At the top of the popularity scale is the open source tool Git: with a usage rate of almost 30% according to Gartner, DVCS solutions (for distributed version control) such as Git are currently among the most widely used developer tools.

Git was originally developed by Linus Torvalds for versioning the Linux kernel and has also become widespread thanks to GitHub (an online hosting solution that now manages large parts of open source projects). Git stores changes as nodes in a DAG (Directed Acyclic Graph) and identifies them with a cryptographic key. Each project in Git is stored in its own so-called 'repository'. A repository contains not only all the files of a project, but also the entire history of the project. A user loads ('clones') the entire repository from a central copy into the local workspace, adds changes locally and then sends these changes back to the central version.

While users particularly appreciate Git's openness and the ability to work with a local repository, its conceptual and functional limitations can prove problematic in an enterprise context. Above a certain repository size, Git no longer works with the best possible efficiency; large files and non-text-based assets are only supported to a limited extent or not at all. The solution also cannot fully meet the requirements of modern software development in terms of traceability, bug fixing and security. Who made which changes to the source code, when and why? Which change led to an error? How can intellectual property be best protected against loss or theft from the inside? Git cannot provide answers to questions like these, because the tool was never designed for this.

In order to do justice to the tension between the preferences of developers on the one hand and the requirements of the company on the other, a way is therefore needed to resolve the existing limitations of Git through extensions and new functions. Git must become a flexible, powerful and team-oriented solution for efficient software development - a kind of central 'Facebook for developers'.

Advertisement

Scaling instead of epository usury

The question of what size a Git repository can grow to before performance suffers and efficient administration becomes too complex is debatable. However, most users agree that this limit is around 1 to 5 GB. As Git repositories typically retain their entire processing history, the size of a single actively used repository will grow over time even if the code base itself does not increase in size.

This limitation is particularly problematic for large projects, as this limit is quickly exceeded depending on the complexity and scope of the software developed. A common way to circumvent this limitation is to split the project into several small parts, each of which has its own repository. An extreme example of this is the Android source code, which now spans more than 1000 repositories. In itself, the distribution of code across several repositories is not a bad thing - as long as each repository contains an independent component that continues to develop independently of all other parts of the project.

In reality, however, such a genuine component structure is often difficult to achieve, and splitting an existing monolithic application into autonomous parts is a complicated and cost-intensive process. Those who nevertheless decide to do so must take meticulous care not to overlook any couplings between different repositories. If the delimitation remains incomplete - and unfortunately this is often the case - the lack of coherence in changes in the separate repositories quickly jeopardizes the stability of the entire project. In such cases, tracking down and correcting the change that caused the disruption is extremely complicated in scenarios with a large number of independent repositories.

In projects with a particularly extensive or rapidly growing code base, a centralized Git management solution can ensure that any amount of source code and files of any size can be managed and used in Git. It is then no longer necessary to fragment the project, so that the same efficient working methods and seamless traceability of error sources are ensured for large projects as for smaller projects.

More than just versioning source code

Git management solutions break down the limitations of Git and facilitate teamwork. Git also allows interaction between the various project participants.

© Perforce

Another limitation of using Git in an enterprise context is the immature handling of large binary files, which can quickly lead to a significant increase in repository size. Git is not able to compress file content, so that central actions such as creating a working copy of the repository ('git clone'), downloading objects from the central repository ('git fetch') or submitting changes to the central repository ('git push') are slowed down by large binary files. Added to this is the fact that Git calculates the SHA-1 checksum as soon as it recognizes a local file change. For (small) text files such as source code, this is done quickly, but for large binary files this process takes considerably longer.

For this reason, many Git users avoid placing large binary files in a Git repository. Instead, they use a separate repository for versioning. However, this results in the same problem that companies face when splitting their code base across multiple smaller repositories: Changes that span different repositories can only be tracked to a limited extent. This makes it much more difficult to track down and rectify errors.

To counteract this, a flexible Git management tool should be able to handle large binary files efficiently. One option is to store the files in a separate repository while only maintaining a link to them in the main repository. As simultaneous changes to binary assets by different users can only be merged automatically to a limited extent, unlike with text files, there should also be an option to lock the current file that a developer is currently editing for editing by others. This prevents two people from editing the same binary file at the same time - and, in the worst case, important changes being lost.

Protecting intellectual property

Especially in times of innovative production scenarios in the smart factory, the security of software-related intellectual property is playing an increasingly central role, as it represents an important competitive asset for the company. While the highest possible level of security must be ensured at all times to provide the best possible protection against external attacks, the risk of internal perpetrators must not be overlooked:

Deviations from normal behavior can detect attempts at espionage by employees.

© Fotolia / Creativa

For example, the German Federal Office for the Protection of the Constitution estimates that the ratio of attacks from within the company to those from outside is already 70 to 30 - almost two thirds of all attacks are carried out by employees from within the company.

Git does not offer its users any special functionalities to detect and prevent such threats. In this area too, companies must therefore ensure the best possible protection for their valuable development assets with the help of special extensions. A version management solution with its comprehensive, detailed logs already offers the best prerequisites for detecting malicious activities within the company. This is because modern behavioral analyses are able to identify espionage attempts at an early stage based on user behavior.

To do this, the corresponding solutions first obtain an overview of the previous activity history of the respective company and in this way gain an understanding of what 'normal' user behavior usually looks like. With the help of special mathematical procedures, deviations from this normal behavior can then be identified: If, for example, a user's access to an important development project is logged, which - measured against their historical access behavior - they do not normally access, this is recognized as an irregularity. However, as this can also just be a harmless coincidence, the risk value only increases when several unusual events occur within a short period of time: if, for example, colleagues in the same team are not accessing the project and the files were accessed at a time of day or night when the employee has never worked before, this indicates that the risk posed by the user is gradually increasing.

The addition of such behavior-based analysis functions to Git gives companies the opportunity to ensure the best possible protection of their business-critical developments: This is because, in justified cases of suspicion, it becomes possible to remove the anonymity of the user in question in consultation with the works council and thus prevent any further outflow of intellectual property from the company.

The best of both worlds

With the increasing spread of intelligent machines and networked control systems, the need for specialized programmers in an industrial context is constantly growing. In order to attract and retain the best talent, concessions are often unavoidable. However, companies in this sector cannot afford to compromise on efficiency, quality and safety. With the help of an integrated, centralized Git management tool on a master or mono repository basis, Git becomes an open, team-oriented Facebook for developers that meets the requirements of both sides: The ability to continue working in the familiar environment of one of the most popular open source development tools, combined with the flexibility, scalability and security that make the complex software projects of our digital age essential.

Author:
Sven Erik Knop is Senior Technical Specialist at Perforce Software.

  • Xing Icon
  • LinkedIn Icon
Advertisement
Advertisement

You might also be interested in

Advertisement

Computer platform

Open source in the industry

The open source concept is very popular in the developer community. Does this approach also have a chance of scoring points in the conservative automation environment? Kunbus takes the plunge with an open source computer based on the Raspberry Pi.

read more...
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Subscribe to our newsletter
Advertisement
Back to home