TL; DR
Three eventualities allow GitHub repositories to be hijacked. Linking on to them could end in malicious code injection; don’t do it.
Background
A discovering throughout a latest shopper engagement brought about us to research the prevalence of dependency repository hijacking which is an obscure vulnerability that permits anybody to hijack a repository if its proprietor modifications their username. This vulnerability is much like subdomain takeover, trivial to use, and ends in distant code injection. After analyzing open-source tasks for this situation and recursively looking out by way of their dependency graphs, we discovered over 70,000 impacted open-source tasks; this consists of fashionable tasks and frameworks from corporations like Google, GitHub, Fb, and plenty of others. To mitigate this situation, make sure that your mission doesn’t depend upon a direct GitHub URL, or use a dependency lock file and model pinning.
In case you are accustomed to Repo Jacking, bounce straight to our Evaluation.

What’s Repo Jacking?
Dependency repository hijacking (aka repo jacking) is an obscure provide chain vulnerability, conceptually much like subdomain takeover, that impacts over 70,000 open-source tasks and impacts every thing from net frameworks to cryptocurrencies. This vulnerability is trivial to use, ends in distant code injection, and impacts main tasks from corporations like Google, GitHub, Fb, Kubernetes, NodeJS, Amazon, and plenty of others. After first discovering it in a latest engagement, we wished to understand how prevalent this vulnerability was, so we recursively analyzed all open-source tasks and located that this can be very widespread and probably impacts you ultimately.
Who’s susceptible?
Each mission whose compilation relies on dynamically linked code from GitHub repositories is probably susceptible. For a mission to be susceptible, the next two issues must occur:
Your code must straight reference a GitHub repository (often as a dependency).
The proprietor of that repository must then change/delete their username.
When the linked repository proprietor modifications their username, it turns into instantly accessible to be re-registered by anybody. Because of this any mission that linked again to the unique repository URL has now develop into susceptible to distant code injection by way of dependency hijacking. A malicious attacker can register the outdated GitHub username, recreate the repository, and use it to serve malicious code to any mission that relies on it.
Do you have to be involved?
Even when your mission that has a dependency on a GitHub isn’t susceptible proper now, if the proprietor of one in every of its dependencies modifications their username, that mission and all different tasks that depend upon the outdated hyperlink develop into susceptible to repo jacking. You’d count on there to be some type of warning when repository modifications places, perhaps a “404 – Repository not discovered” type of error, however there’s not. Moreover, there’s one little Github function that makes this vulnerability distinctly extra harmful: Repository Redirects.
‘Repository Redirects’ exacerbate the issue
When a GitHub consumer modifications both the title of a repository or their username GitHub units up a redirect from the outdated URL to the brand new one; this redirect works for each HTTP and Git requests. This redirect is created any time a consumer modifications their username, transfers a repository, or renames a repository. The issue right here is that if the unique repository (on this case, “twitter/bootstrap”) is ever recreated, the redirect will break and ship you to the newly created repository.
Instance situation:
The hyperlink https://github.com/twitter/bootstrap factors to the repository “twitter/bootstrap” however will truly redirect you to the “twbs/bootstrap” repository.
If ever Twitter modified their GitHub username, anybody might then re-register it, recreate a repository named “bootstrap” and any new request to https://github.com/twitter/bootstrap would go to the newly created repository.
Any mission that relied on https://github.com/twitter/bootstrap would now begin loading code from this new repository.
Redirection is a handy function because it means your hyperlinks don’t instantly break while you rename your account. Nevertheless it additionally signifies that your mission can unknowingly develop into susceptible to repo jacking. Out of your viewpoint, nothing has modified – your code nonetheless compiles the identical, and every thing works because it ought to. Nonetheless, your mission is now susceptible to distant code injection, and you’re none the wiser.
The three Hijack Eventualities
To get a bit of extra particular, there are technically three other ways a repository can develop into hijackable:
A GitHub consumer renames their account. That is the most typical method a repository turns into hijackable since it’s not unusual for a consumer to rename their account and once they do, every thing continues working as anticipated attributable to repository redirects.
A Github consumer transfers their repository to a different consumer or group then deletes their account. When a consumer transfers a repository, a redirect is ready up and by deleting their consumer opens it as much as being hijacked by anybody.
A consumer deletes their account. That is the least impactful of the three, because the second the unique consumer deletes their account, any mission that references it’ll begin having errors when making an attempt to fetch the repo.
Word: There have been just a few circumstances (one, two) of attackers re-registering the deleted username between the time the consumer deletes their account and tasks attempt to fetch the repo. This situation has been written about earlier than right here.
GitHub’s Response
We contacted GitHub earlier than publishing this text, and so they knowledgeable us that it is a identified situation however that they at present shouldn’t have any plans to vary the best way redirection or username reuse works. They’ve offered some mitigations to this downside for some fashionable repositories by disallowing re-registering the names of repositories which have greater than 100 new clones within the week main as much as their deletion, as outlined right here. This does present some extent of safety however will not be a foolproof resolution as many smaller repositories don’t meet this criterion however might be nonetheless be depended upon by fashionable tasks. As such, builders that wished to make use of them wanted to hyperlink on to GitHub.
The foundation downside right here will not be a lot that GitHub permits redirects and username reuse, however fairly that builders are pulling their code from unsafe places. GitHub can’t police builders who’re utilizing their service for unintended functions. There are a lot of bundle managers accessible (in truth, GitHub themselves has one) constructed to resolve the issue of distant code dependencies, and builders have the accountability of making certain that they load their code from safe places.
Evaluation
Now the subsequent query that involves thoughts is, “How widespread is that this actually?”. It seems that sifting by way of all open supply tasks, compiling their dependencies, discovering all hijackable repositories, and developing a dependency graph of susceptible repositories will not be straightforward. So, right here is how we did it.

Step 1 – Knowledge Assortment
One of many hardest elements of performing large-scale evaluation of open supply software program is the preliminary knowledge assortment. Discovering an updated, correct, and simply searchable index of all of the open-source tasks is difficult. We primarily used two datasets for this evaluation:
GitHub Exercise DataThis is offered by Github themselves and is a large dataset that features over 2.eight million repositories, together with all of their information and contents; all the dataset is over 3TB price of content material. Conveniently, it’s hosted as a public BigQuery dataset on the Google Cloud Platform (GCP), which signifies that we are able to use BigQuery to run SQL instructions over all the dataset from inside GCP itself, and we don’t must obtain all the 3TB+ file.To really carry out the search, we generated a regex that catches any Github URL or different frequent Github dependency hyperlink codecs corresponding to github:username/reponame. Utilizing this regex, we have been in a position to extract the repository, file title, and file contents for every file that incorporates a reference to a GitHub hyperlink. This shrunk our search area down from 3TB+ to a extra manageable 4GB. This filtered dataset included Four million distinctive GitHub hyperlinks and over 700 hundred thousand totally different Github customers.
libraries.io Librabries.io is an open-source mission that goals to combination all of the dependency from multi totally different packager managers right into a graph-like dataset. That is wonderful since not solely does it do all of the heavy lifting for us in linking what depends on what, however moreover they make all the dataset accessible to obtain at no cost. Uncompressed, this dataset is over 100GB+ however might be loaded straight right into a database for simpler processing.
It was vital that we use the 2 datasets as a result of each is sweet at various things. The “Github Exercise Knowledge” dataset allowed us to seek out each potential Github hyperlink referenced in a repository, even when it’s not being utilized in an apparent place like a bundle supervisor manifest. A few of the most attention-grabbing findings weren’t essentially direct code dependencies. We frequently discovered Github URLs used straight in a bash script to clone a repository or a docker picture that might pull a repository from Github when constructed.
Instance set up script; the GitHub hyperlinks can be found to be registered by anybody.
One other frequent discovering was hijackable repositories as Git submodules, one thing that might have been missed by customary dependency evaluation. Conversely, the libraries.io dataset was an already cleaned, filtered, and formatted dataset th allowed us to construct a dependency graph and simply assess the extensiveness of this vulnerability. Collectively, these datasets gave us a extra full view of the general impression of this vulnerability to open supply tasks.
Step 2 – Clear Up
Now that we had collected all this knowledge, we would have liked to sanitize and normalize it. This was a sizeable effort since we would have liked to account for the totally different codecs of every bundle supervisor. Moreover, we wished to take away any hyperlinks that weren’t truly getting used as a dependency. Many of those hyperlinks have been utilized in feedback, for instance, one thing like: //code impressed from github.com/username/reponame, or in documentation textual content information. Since we have been primarily involved with the opportunity of code injection, we trimmed off something that was not going for use straight by the code. This left us with a bit of over 2 million distinctive GitHub hyperlinks that have been referenced by information in significant methods.
Step 3 – Hijackable Usernames
Now that we’ve a clear(er) checklist of tasks that straight depend upon a GitHub hyperlink, we would have liked to seek out which customers have been at present unregistered. At this level, we had about 650okay Github usernames that we needed to kind by way of. Utilizing the GitHub API we might test to see if a username exists, however we have been fee restricted to five,000 requests an hour, which signifies that it could have taken us over 5 days to test all of the usernames. With a bit of little bit of intelligent logic and the GitHub GraphQL API, we have been in a position to convey that right down to a bit of over 2 hours to scan all 650okay customers.
So, what are the outcomes? We discovered that about 7% (about 50okay) of the usernames we collected are unregistered. We actually weren’t anticipating the quantity to be so excessive. We thought that lower than 1% of the usernames we discovered have been going to be hijackable. Apparently, folks get uninterested in their usernames excess of anticipated.
Step 4 – Susceptible Tasks
As soon as we had all of the hijackable usernames, it was only a query of doing a reverse search on our dataset for each mission depending on a repository owned by a kind of usernames. After some additional filtering and removing of false positives, we discovered a complete of 18,000 tasks straight susceptible to repository hijacking. These tasks have a mixed GitHub begin rely of over 500,000 stars and embody tasks in nearly each language from among the greatest open supply organizations.
This quantity alone is terrifying, however trendy codebases will not be big monolith dwelling inside single repositories. As a substitute, they rely and depend upon many different tasks for performance. That is nice for maintainability and reusability, but it surely signifies that a vulnerability in a single fashionable dependency can enormously impression many tasks down the dependency chain. Successfully any mission that’s depending on one of many 18,000 straight susceptible tasks is itself additionally susceptible.
Step 5 to ∞ – Dependency Evaluation
Now that we had a listing of straight susceptible tasks, we used that together with our earlier dataset to carry out a dependency graph taint search and discover each mission that relies on a susceptible repo of their provide chain. For this evaluation, we included regular dependencies and fewer apparent ones corresponding to improvement dependencies or dependencies which aren’t in the principle bundle manifest file. If one in every of these auxiliary dependencies is susceptible to repo sniping, it would take a bit of longer for the impression to propagate up the dependency chain since it would solely occur when the builders publish a brand new model. With that in thoughts, we started our taint evaluation.
Because of the chance that the checklist of susceptible tasks grows exponentially out of hand, we slowly walked the graph one depth layer at a time. Between every go, we manually went by way of the outcomes and trimmed off any apparent false positives to attenuate error propagation and guarantee our outcomes didn’t get stuffed with false positives.
We needed to cease after 5 passes.
Up till 5 passes, the info was rising predictably, and every spherical of search was taking an affordable period of time, however the second we reached a depth of 6, the info began rising uncontrollably. Trying on the outcomes for the fifth go, the rationale grew to become clear; we had reached a number of enormous frameworks which are foundational and relied on by hundreds of different tasks.
This was sufficiently deep for us to understand the impression of this vulnerability. General, Safety Innovation discovered over 70,000 impacted tasks, with a grand complete mixed GitHub star rely of over 1.5 million; that’s extra stars than the mixed complete of the highest eight greatest GitHub repositories ever. It’s laborious to precisely measure, however we estimate that these tasks have a mixed complete of no less than 2 million each day downloads.
Impacted tasks embody repositories from enormous organizations corresponding to Google, GitHub, Fb, Kubernetes, NodeJS, Amazon, and plenty of others. All the things from small private consumer tasks to fashionable net frameworks utilized by a whole bunch of hundreds of organizations is affected. It is usually attention-grabbing to notice simply what number of several types of software program this impacts. We discovered susceptible router firmware, video games, crypto wallets, cell apps, and plenty of different distinctive tasks.
Mitigations
Now that we perceive how impactful and widespread this vulnerability is, it is very important know what remediation choices can be found to guard your personal mission’s provide chain.
Don’t Hyperlink On to GitHub Repositories
That is the obvious one: GitHub repositories will not be, and have by no means claimed to be, an alternative choice to a bundle supervisor. There are not any ensures that GitHub hyperlinks are static, and so they shouldn’t be used as direct code dependencies. Utilizing a devoted bundle supervisor has many benefits, each from a usability and safety perspective, and may at all times be most well-liked over straight linking to a repository. Nonetheless, do word that you could be nonetheless be susceptible to repo jacking if one in every of your dependencies themselves straight hyperlinks to a GitHub URL even in the event you test every of your transitive dependencies for direct hyperlinks, a kind of dependencies would possibly nonetheless have a hidden dependency to a GitHub repo. We’ve seen this usually with construct scripts, which fetch code straight from a developer’s repository, or inside testing code. Whether it is susceptible to a hidden repository hijacking, the subsequent time that dependency will get up to date, it might comprise malicious code that then makes its method into your utility.
Model Pinning and Lock Information
One other method to assist mitigate this vulnerability is thru model pinning and lock information. Model pinning is when a selected model is included with a dependency to make sure that solely that model will get downloaded. Within the context of GitHub hyperlink dependencies, that is usually a SHA1 git commit hash, which is included to instruct your bundle supervisor device to solely obtain that particular commit of a git repository. The objective with that is that even when that repo will get hijacked, a malicious attacker wouldn’t be capable to modify the code with out additionally modifying the commit hash. You can even model pin a dependency to a selected department or tag, however there’s nothing to cease a malicious consumer from updating that tag or department, so it doesn’t present any safety in opposition to repo jacking.
A lock file is a file made by your bundle supervisor device that features a checklist of version-pinned dependencies to make sure that subsequent time somebody tries to construct that mission, they obtain the very same bundle and model specified within the lock file. Lock information can even typically embody an integrity hash of the downloaded bundle to additional guarantee its authenticity.
Model pinning and lock file implementations are bundle supervisor particular, however most huge packager managers assist these options. That being stated, they’re removed from foolproof.   The truth is, whereas we have been conducting this analysis, we managed to bypass most main bundle supervisor’s model pinning and lock information. Keep tuned for a future weblog publish the place we element these points in depth.
Vendoring
Vendoring is the act of downloading all of your dependencies beforehand and together with them in your repository. This has the benefit that your repositories are fully self-contained with the code wanted to run them, and it additionally helps shield you in opposition to repo jacking. Since all of your dependencies are already downloaded, it is sort of a lock file that additionally consists of the content material to your dependencies. Even when a kind of dependencies will get hijacked, you will have already downloaded the code you want. The caveat right here is that you simply would possibly nonetheless develop into susceptible the subsequent time you replace your dependencies if a kind of dependencies has been hijacked. Many builders simply replace all their dependencies when their bundle supervisor tells them to, with out trying that the particular modifications that have been made.   In these circumstances, vendoring supplies little or no safety because it solely works in the event you hold a detailed eye on dependency upgrades.
Conclusion
Hopefully, this text helped shed some gentle on the impacts of dependency repository hijacking and permit tasks to raised safe their dependencies provide chains. The proliferation of COTS, third get together software program, and open supply will proceed to develop, and together with it, so will the variety of assaults concentrating on them. Though using third get together dependencies get options out the door faster and reduces improvement time, it’s crucial that you simply scrutinize them the identical method you do your personal code – maybe much more so.