What’s going on with (Sem|open)grep?
Introduction
A few weeks back I found myself in the Semgrep slack channel trying to find out whether the extensive effort I had put into developing a set of custom security vulnerability detections as part of my work for one of my main clients was about to go out the window.
The reason was Semgrep’s recent announcement that it would be removing certain functionality from its command line tool.
What I discovered was that some of my rules would no longer work but that since I was lucky enough to be working with the SARIF output and not the JSON output, the extensive “nosemgrep” suppression I had put in place, allowing me to document rule exceptions inline in the code would still operate.
Now before anyone starts accusing me of freeloading for not wanting to login or pay for Semgrep, these detections were my own self written rules, being integrated into existing Semgrep scans, which themselves were being run using Semgrep’s LGPL licenced engine, through a vendor who has Semgrep’s explicit blessing to do this.
Having discovered that the only casualty of the change was my join mode rules, I continued building my detections but with a continued sense of concern about what the future would hold for them…
It was therefore with mixed emotions that I read about the Opengrep fork of the Semgrep engine which was announced last week.
One of the reasons for the mixed emotions was the volume of misinformation being generated about what Semgrep had done and I tried to clarify things slightly in a social media post.
There still seems to be a lot of confusion though so I thought I would try and summarise where we are so far and what I hope to see from this development.
What is Semgrep?
Semgrep is a fast, open-source, static analysis tool that searches code, finds bugs, and enforces secure guardrails and coding standards. Semgrep supports 30+ languages and can run in an IDE, as a pre-commit check, and as part of CI/CD workflows.
Semgrep is semantic grep for code. While running grep "2"
would only match the exact string 2
, Semgrep would match x = 1; y = x + 1
when searching for 2
. Semgrep rules look like the code you already write; no abstract syntax trees, regex wrestling, or painful DSLs.
https://github.com/semgrep/semgrep?tab=readme-ov-file#–code-scanning-at-ludicrous-speed
So what does that mean? Basically, Semgrep lets you write really simple rules to find particular patterns in a bunch of different coding languages. The tool can be run simply on the command line making it super flexible for where it can be used and it is generally very fast.
The core engine is considered Open Source Software licensed under the LGPL and until the recent change was actually called “Semgrep OSS”.
Semgrep has a library with a wide set of rules. Some of these are available to use without a subscription however they have always been licenced less permissively than the core engine.
Specifically, up until December 2024 they were licenced under the Commons Clause as an extension to LGPL which basically means they are source available and can be used for the most part but there are limitations on how they can be used commercially. The intent seems to have been to prevent people selling Semgrep OSS + the rules as a paid SAST solution but the licence is a little hard to understand, particularly the section which talks about under what circumstances the rules can and cannot be used in a commercial product. This confusion is one of the things that led to the changes in December 2024.
Semgrep also offer a subscription to a more sophisticated commercial version of the tool as well a fully featured cloud management platform and a bunch of other tools including SCA, and Secrets Management.
What is my experience with Semgrep?
I honestly cannot remember when I started using Semgrep but I have absolutely loved the tool from the beginning. The sheer power, simplicity, and flexibility of the tool are like nothing available in the open source world, with previous tools mostly being either super complicated or limited in features or language support.
I have used it in the courses I have delivered at OWASP and Black Hat about AppSec tools as an example of an open source SAST solution. However, my favorite use for it is as a tool for developing custom rules and detections to solve organization specific problems, like checking for missing authorization controls. I collaborated with Michal Kamensky on a talk about this and helped her develop a full training course on this topic which she delivered at Black Hat last year.
But, and this is a big but, the only reason this all happened was because the engine was licensed LGPL and freely available to use in this way.
Checkmarx will let you write custom rules but you can only use them within their platform. I hear that CodeQL from Github is super cool and powerful but I have never used it because it has a restrictive licence.
Semgrep basically became the watchword for simple and powerful SAST but almost certainly only because of the freedom to use the core engine.
What did Semgrep announce in December 2024?
Semgrep publicised two key changes in December 2024. A lot of people (including myself) were a little confused about these changes so I will try and state them as clearly as possible. (Note that the Chief Product Officer of Semgrep, Luke O’Malley, subsequently wrote a clarification.)
The first change was renaming the core engine from OSS to Community Edition. In the process, features were also being removed from the core engine into the commercial version including features previously considered “experimental”. The licence on the engine was not changed and remains as LGPL.
The second change was changing the licence on the Semgrep rules. The new licence now says very clearly:
“You may use the rules only for your own internal business purposes. This license does not allow you to distribute the rules, or to make them available to others as a service.”
Why did they do this?
The nice thing about an open source project is that lots of people will use it, like I noted above.
The problem with an open source project…is that lots of people will use it. And some of them may use it in a way which you don’t like.
In Semgrep’s case, a whole bunch of companies started integrating the LGPL licensed Semgrep engine into their commercial offerings. It seems that some of them were also using Semgrep’s ruleset as well.
Now using the Semgrep rules in a commercial product was certainly a violation of Semgrep’s intent but the slightly unclear Commons Clause licence may have meant that companies felt like they were following the letter of the law, maybe my adding extra content or enriched information to the rules (I am speculating).
On the other hand, integrating the LGPL Semgrep OSS engine is perfectly allowed. This very much goes against the intent of Open Source but realistically the entire tech ecosystem runs on using Open Source software, in a best case scenario without providing any benefit to the author and in a worst case scenario by actually expecting service or support or disadvantaging the original author in some other way.
In this case, the original author was being disadvantaged because companies were now competing against them using their own engine (and maybe their own rules?), and Semgrep evidently decided it needed to take action.
What has happened now?
A bunch of vendors (most of whom have presumably invested a lot to integrate Semgrep into their products) have gotten together and have effectively declared that Semgrep has carried out a “rug pull”.
The term “rug pull” is usually used when an open source project changes the licence of the project to be more restrictive in a way that causes a problem for some or all users of the project.
This group of vendors has announced an open source fork of Semgrep called Opengrep which is also licensed LGPL. The group has declared its intention to keep the removed features in that fork and continue to maintain and enhance it as a multi-vendor effort under an open source foundation.
“We believe that discovering security issues must remain accessible to all. Opengrep will empower every developer with open and transparent SAST, making secure software development a shared standard.”
https://www.opengrep.dev/
Is this a “rug pull”?
In this case, license statuses have not really changed. The core engine remains LGPL and the rules remain under a more restrictive license.
I don’t think anyone can claim a “rug pull” about the rules, they were clearly under a restrictive licence and whatever reasoning organizations had to use them commercially, it was clearly not how Semgrep intended them to be used.
When it comes to the engine however, I think it is a little more complicated. Whilst the licence of the engine didn’t change, the effect of the functionality changes meant that certain features were relicenced under a fully commercial licence. This not only prevented their use in commercial products but arguably prevented their use in any setting.
I think it is fair to say that it is completely reasonable and expected to add certain features to a commercial offering and not to a free/OSS offering.
But to actively remove features from a free/OSS offering and then remove the OSS moniker entirely sends a clear signal of uncertainty in an Open Source project, arguably more so than just simply changing license to prevent commercial abuse.
I think the “rug pull” concept is a little problematic in general, as I’ll explain below, but if you use the definition above then that is basically what happened here.
The problems with the “rug pull” concept
I don’t want to get into a long rant about this. We have seen a bunch of examples where a company that has maintained an open source project for years finally gets tired of being competed against by a bunch of other companies. They change to a different licence in a way that generally only affects those companies who are competing against them with their own software, and suddenly they are the bad guy.
Now, on the one hand, changing an open source project to a more restrictive licence seems to go against the “open source” ethos. But on the other hand, using someone else’s open source product to compete against them also seems to go against the “open source” ethos.
I therefore struggle to sympathise when, such as with Redis and Valkey, suddenly you have a medium sized tech company like Redis being the “bad guy” for changing their licence in a way that disadvantages some small, “hard done by”, tech companies such as (checks notes) Amazon, Google, Tencent, etc.
So what about this case?
As you might guess from the rant that I didn’t really want to get into, I would not naturally side with a fork by a bunch of commercial companies against the original maintainer. But in this case, I am a little more conflicted and I think this might be a fork worth supporting.
Semgrep is now in a very interesting position. Previously, it was the primary maintainer of a very popular open source project with full responsibility. I don’t know how many contributions they were seeing coming back from the wide user base but I imagine it was relatively limited.
Now that opengrep exists with the same licence, Semgrep is free to pick and choose the features it wants to adopt from opengrep.dev. At the same time, it may feel less pressure to directly add features to the Community Edition and can focus on enhancing its commercial features and pushing its other commercial offerings.
Even if Semgrep chooses not to contribute much back to opengrep, no-one is going to forget who originally wrote the engine, who was responsible for getting it to the level of sophistication it is at now, and where to go to get the highest level of support and integration.
Meanwhile, opengrep is licensed as LGPL. If Semgrep wants to adopt its features in Semgrep Community Edition, opengrep can’t exactly change the licence to prevent this as long as Semgrep CE also keeps the LGPL.
Plus, there will now be an expectation that any engine features which the opengrep foundation members build for their platforms will also be in the open source project.
In addition, Semgrep also explicitly said that they anticipated this move and the changes they made could be seen to have been designed to precipitate it.
First impressions of opengrep
Obviously it is very early days for this project. The team behind it seems to be quite determined to put effort into it and seem to have some developers specifically dedicated to work on it.
I do however find it a little puzzling that their first effort seems to be creating a version that runs on windows and I do hope that’s isn’t going to cause a bunch of weird compatibility issues, (e.g. different path characters in a join mode rule).
I think the biggest challenge is going to be trying to maintain a truly open source ruleset to go with the tool. As it stands at the moment, using Semgrep’s ruleset for anything other than testing and benchmarking in the repo itself is a little hazy from a licensing perspective.
More importantly, if the project wants to demonstrate that it is a good-faith effort to keep maintain a tool for the benefit of the community, that also means being seen to do what is ethically right and not just legally right. In this case, that is to steer well clear of Semgrep’s ruleset.
I have not loved the initial marketing from the various vendors involved. I understand the will to make a splash but a lot of the messaging had been disrespectful to Semgrep and/or has spread misunderstandings or misinterpretations.
On the other hand, my initial interactions with the team have been positive and it seems like they want to do good.
I called them out over the weekend about the fact that their messaging and actions seemed to imply that the rules were also previously under a less restrictive open source license, when this was not the case.
They seem to have responded pretty positively to that and have made various changes to clarify things.
The bottom line
I think that the opengrep fork is a good thing. It secures the status of the open source project and strengthens it for the whole community, including Semgrep. It also means that many other companies who are strongly invested in the engine will be pooling their resources and contributing their work back to the community.
I will be keeping a close eye on the new project team to see that they deliver on their promises in the hope that I can adopt opengrep for my own uses and evangelise it as a sophisticated way to run custom static code testing.