Excalidraw+ is officially SOC 2 Type I certified! Here's how we got there — the tools, policies, infra changes, and what we learned along the way.
We got tired of endless security questionnaires, so we got SOC 2 certified to make things smoother for everyone.
The process:
Result: Passed SOC 2 Type I 🎉
In progress: Type II
Next: maybe GDPR, maybe ISO 27001 (depends on demand)
Most of the security stuff we were already doing, SOC 2 just forced us to write it down officially.
At some point, every company reaches the phase where "we promise we're doing things securely" just doesn't cut it anymore. We were getting tired of filling out endless security questionnaires, the kind of stuff that can easily live in a proper trust center.
It's one thing to say, "We use MFA, we encrypt stuff, we care about your data," but it hits different when a third party auditor confirms we're actually doing it by the book.
And since our team is still fairly small, we figured it was the right time to get our practices locked in early.
If you're wondering how SOC 2 works or planning to get certified yourself, this post aims to shed some light.
SOC 2 is a security and compliance framework created by the AICPA. It defines how companies should handle customer data using five criteria: security, availability, processing integrity, confidentiality, and privacy.
There are other frameworks, such as ISO 27001, and in the long run, it pays out to get that one too, depending on what your customer base is, but starting with SOC 2 is never a bad idea. It's widely recognized by US companies, less complex than ISO 27001, and builds a solid security foundation to build upon.
There are two flavors of SOC 2:
We've completed SOC 2 Type I (🎉) and we're already working on Type II.
Before you start the certification audit itself, you need to make sure you're in compliance to begin with. If you're young and brave, you can read up on the criteria requirements directly from AICPA without using any third-party services to help get going, but such a document doesn't hold your hand or tell you what you need to do. We do not recommend it.
Instead, we looked for services that can get you set up and going easily (well, easier). They help you centralize compliance docs, security workflows, audits, risk monitoring, and all the other things you definitely don't want to be keeping up to date with instead of shipping your product. We narrowed the list down to Drata and Vanta, and in the end, went with Vanta as most of the services and providers we use already had an integration there.
Once you plug all your services in (for us, this meant Vercel, GCP, DigitalOcean, GitHub, among others), Vanta runs a check to see what needs fixing.
Anything without an existing integration, you need to track manually. Sometimes you need to supply evidence in the form of screenshots, so keep your "receipts" at hand! You can do this in Google Drive or elsewhere, but we built this stuff into our internal admin dashboard. It takes a little bit of work upfront, but it pays off long-term and keeps everything nice and organized.
Vanta also helps you introduce secure workstation policies, such as requiring disk encryption, screen locks, and the use of password managers across your entire team. Not exactly groundbreaking stuff, but useful and necessary, and trying to roll that out on our own would have been a nightmare.
The upside for us was that we got a bird’s eye view of our team: who has access to what, whether MFA was enabled, and so on.
Doing all that was a bit overwhelming at first, but once we got the hang of it, it was surprisingly manageable.
The toughest (or let's say, the most annoying) part by far was the policies. You need a ton of them: "Code of Conduct," "Human Resource Security Policy," "Access Control Policy," "Operational Security Policy," and more. Vanta provided some boilerplate templates to get us started, but we still had to tailor them for our company, especially since we're remote-first. Finding the balance between keeping our startup vibes and introducing more rigid and structured processes was key.
Actually implementing those policies was easier for us, as we had already been doing most of these things without them being written down, but there were still some processes that needed updating or technical changes that needed implementing. This can be time-consuming, especially when you're also actively developing a product concurrently. The key is to carve out some dedicated time for this, taking on tasks one by one, and aim for a gradual roll-out.
Even though paperwork isn't exactly our favorite thing, some of it actually made a lot of sense. It forced us to properly write down how we handle incidents like outages and similar, instead of it existing in bits and pieces or locked in the minds of specific team members. And for our customers, it offers a peek into how we operate behind the scenes.
Since we're a remote team, we implemented a zero-trust production access model with strict role-based controls. Production access is limited to essential personnel (currently the technical co-founders) and operates through our automated deployment pipeline for all routine operations. This approach minimizes the attack surface while ensuring our support team can handle customer requests through our purpose-built internal admin dashboard, which provides controlled access to necessary functions without direct production exposure.
It also helps us automate SOC 2 tasks like tracking and managing resource access, conducting regular access reviews, and keeping a paper trail for compliance reasons highlighted above. And yes, we log who touches what.
Of course, no compliance journey would be complete without a few moments that feel like they come straight from a company training video. Well, watching those was quite literally part of it (No, Sheila, passwords go into the password manager, not on your monitor's sticky note), but we now also run regular staff training sessions. Our new member onboarding calls have also become more structured.
We went a bit crazy here. We started by splitting our monolith into more services and kept adding new apps into our codebase as we grew. To manage it all, we chose Nx as our build/monorepo framework. Migrating to Nx helps us standardize how our dev team runs development, builds, shipping, and tests within our GitHub CI/CD pipeline. Nx gives us custom executors that we use to handle environment variables, accommodate differences between frameworks like classic SPA and Next.js, and others. The speedup due to caching was a nice bonus (you can do this locally, or in the cloud for added benefits).
For managing environment keys, we picked Infisical, which is end-to-end encrypted, self-deployable, and basically ticks all our boxes. This setup lets developers access only development keys and nothing more, same for CI. No more committing environment variables into the codebase or injecting them manually into CI. Try it once, and you'll never want to go back. The secrets management tool also lets us run everything in CI smoothly, such as testing for missing or leaking environment keys. One of the tougher challenges was making CI work smoothly alongside the VPN and firewalls we have in place.
And our custom firewall and VPN setup finally pushed us to upgrade to GitHub Enterprise to get dedicated IPs for our runners.
We also set up monitoring for our services and made a public status page. For logging, we use Vector and Axiom.
As we wrapped things up, we needed to verify that everything was as secure as we planned. So, we conducted penetration testing across the entire Excalidraw+ platform. It found some minor issues, like exposed headers, which we fixed right away. Running penetration tests at least once a year to make sure everything is squared up is a must.
Turns out, your vendors need to have their act together, too. Every service that touches customer data needs to be evaluated and documented.
The good news? Most of the big players we rely on (Vercel, Google Cloud, GitHub) already have their SOC 2 reports ready to go.
We use a combination of Vanta (which handles most of the common vendors) and our internal admin dashboard for the ones Vanta doesn't cover, because managing this in spreadsheets gets old fast. For each vendor, we document what data they access, their certifications, key risks, and how we mitigate them.
Pro tip: Start this early. Some vendors take weeks to respond, and you might need time to find alternatives.
We don't care for tracking our customers, but we do need to know the essentials. How many users do we have, what features are actually used, things like that.
For Excalidraw, we use a self-hosted version of Umami. For public-facing stuff like our open-source editor and landing pages, we use Simple Analytics.
Both do the job without invading your privacy. So if you're wondering why you don't see a cookie banner (Umami FAQ, Simple Analytics ) on our site, that's why.
When it comes to the audit itself, you can either pick from the auditors your provider (e.g., Vanta) works with or find your own. We chose Insight Assurance. What we learned, though, is that we should have contacted them way earlier in the process, as they could have helped us with some of the policies and risk definitions we were overthinking, which would have saved us a lot of time. But hey, lesson learned.
As for the good ending, we passed SOC 2 Type 1. While Type 2 is next on the list and it's what everyone should aim for, Type 1 is a good stepping stone along the way, as it already demonstrates to your customers you're serious and that you're not running your business from the back of a tool shed.
If you want to go through the paperwork with a good wine in the evening, you can find it all in our trust center, below.
This is all good, just a note for anybody reading this to the end: there's basically no way not to pass your Type 1, at least not if you're using a serious auditor. The point of a Type 1 is to document a point-in-time baseline. The Type 2 is the first "real" audit, and basically just checks whether you reliably did all the things you attested to in your Type 1.
All that is to say: you want to minimize the amount of security work you do for your Type 1, down to a small set of best practices you know you're going to comply with forever (single sign-on and protected branches are basically 90% of it). You can always add controls later. Removing them is a giant pain in the ass.
This is always my concern for people going into SOC2 cold: vendors in the space will use the Type 1 as an opportunity for you to upskill your team and get all sorts of stuff deployed. A terrible and easily avoided mistake.
I write this only because the piece ends with Excalidraw psyched to have cleared their Type 1. I hope their auditors told them they were always going to clear that bar.
Thomas is being a good HN citizen so he's not plugging his own blogpost, but for anyone else embarking on their SOC2 journey i'll plug his guide for him: https://fly.io/blog/soc2-the-screenshots-will-continue-until...
These two comments on this thread are as good as anything I've read on this subject:
In case it's helpful, I also collate quality blog posts in this genre over at https://rami.wiki/soc2/
I get a 404 currently, fwiw.
Fixed! Pages drops the custom domain whenever I push right now, have been putting off debugging it - apologies
If I understand the issue correctly, you just need a file called CNAME in the root of your repo containing your custom domain, like this: https://github.com/justusthane/justusthane.github.io/blob/ma...
Thanks! Unfortunately, I've somehow fallen off the paved road :) https://github.com/ramimac/wiki/blob/main/CNAME
GH Pages is particular about how your apex and www records are set up. I believe you need apex A records pointing to
185.199.108.153 185.199.109.153 185.199.110.153 185.199.111.153
which you already have. Your CNAME record at www.rami.wiki needs to point to "ramimac.github.io/wiki", and your CNAME file in the root of your repo needs to contain "www.rami.wiki" (www is necessary).
At this point, https://rami.wiki should automatically redirect to https://www.rami.wiki.
At least, that's more or less how mine is set up and it works for me :) I had the same issue as you until I got that all straightened out.
Am I reading correctly between the lines? That sounds like you're suggesting that vendors in this space will actively work against your interests, and scope creep type 1, to get more business for type 2?
I don’t think it’s malicious. I usually see it happen when the company staff in charge of working with the auditors either aren’t interested in engaging (often due to stigma and baggage about the compliance industry) or don’t realize the dynamic of what they’re responsible for.
The auditors want you to get the Type 1. To do that they need docs and policies. If they say “send us your change management policy” and your team either says “we don’t have one, what would it look like” or sends them a one-line policy that says “The team does change reviews”, the auditors are going to send back recommendations for what you should include. They’re trying to be helpful (within the specific scope of getting you a type 1), but they aren’t engineers and don’t know your system. So a lot of their advice is going to be irrational and scope-creep. As a mundane example: the easiest thing for them to suggest if your change management policy doesn’t exist or looks weak to them is “set up a change control board that meets weekly to review all changes”, but that would be nuts to implement.
Or the vendors you’re paying to help you adopt a bunch of corporate paperwork are helping you adopt a bunch of corporate paperwork. Kinda their job, no?
If I hire a fire safety consultant, I gotta expect he’s going to recommend sprinklers and extinguishers and fire doors.
Such a cat and mouse game. Customer wants security. Vendor may or may not want it but wants to minimise required security to make enterprise sales. Vendor's vendor may want to add security (real or theatre) to type 1 to get more business for type 2 compliance.
Yup, for the most part you define your own controls! Even type 2 is pretty hard to "fail" if you're serious about security. You're more likely to just get minor exceptions in the report for being sloppy about something.
I think we've managed to get an exception in every Type 2 we've done (each time, some dumb paperwork policy thing; I think in one instance we were untimely with a post-facto merge PR signoff, the closest we've come to an actual slip. The first exception we got, I raised hell and wrote a management statement. But nobody cares about trivial exceptions, and so I've learned not to here either.
But, true, I didn't even pay attention in our last Type 2 (I don't run security here) --- passing was a foregone conclusion.
"Nobody cares about trivial exceptions"...except the most persnickety GRC teams of your most persnickety enterprise customers.
Or at least *cough*, that's what I've heard.
>I write this only because the piece ends with Excalidraw psyched to have cleared their Type 1. I hope their auditors told them they were always going to clear that bar.
The signal having a Type 1 says is that you're interested in even trying to pass the next one, which in itself is a good sign to everyone. Maybe being excited and proud of "passing" type 1 is a little exaggeration for folks who know the details, but I'm very willing to forgive that. A lot of orgs show a lot more pride about much more dubious things.
I'm not saying it's a bad sign, I'm saying: you really can't fail a Type 1, unless your auditor is messing with you (a good auditor's job is to make sure you end up with a Type 1). My broken-record SOC2 point is: minimize your Type 1 controls, and add new controls over time.
You can do lots of security things. I'm not saying minimize security. I'm saying minimize the security things you talk about in your Type 1.
I'm saying even if you can't fail, I'm still willing to congratulate an org for starting even though the first milestone isn't particularly impressive.
Congratulations, Excalidraw. Also I love your product. Meanwhile, let's get back to talking about the pitfalls of actually getting SOC2.
Agreed. Certifications leave a lot to be desired but are at least better than nothing. I've been through it several times and it's a hard topic between good intentions and bad implementation.
Former Head of Security GRC at Meta FinTech, and ex-CISO at Motorola. Now, Technical Founder at a compliance remediation engineering startup.
Some minor nits. One can't be SOC 2 "certified". You can only receive an attestation that the controls are designed (for the Type 1) and operating effectively (for the Type 2). So, the correct phrase would be that Excalidraw+ has received its "SOC 2 Type 1 attestation" for the x,y,z Trust Services Criteria (usually Security, Availability, and Confidentiality. Companies rarely select the other two - Privacy, and Processing Integrity - unless there's overlap with other compliance frameworks like HIPAA, etc.)Reason this is important is because phrasing matters, and the incorrect wording indicates lack of maturity.
Also, as others have said, no one "fails" a SOC 2 audit. You can only get one of four auditor opinions - Unmodified, Qualified, Adverse, and Disclaimer (you want to shoot for Unmodified).
As fyi, the technical areas that auditors highly scrutinize are access management (human and service accounts), change management (supply chain security and artifact security), and threat and vulnerability management (includes patch management, incident response, etc). Hope this information helps someone as they get ready for their SOC 2 attestation :-)
Similarly, the report areas you want to be very careful about are Section 3: System Description (make sure you don't take on compliance jeopardy by signing up for an overly broad system scope), and Section 4: Testing Matrices (push back on controls that don't apply to you, or the audit test plan doesn't make sense - auditors are still stuck in the early 00's / "client server legacy data center" mode and don't really understand modern cloud environments).
Finally, if you're using Vanta/Drata or something similar - please take time to read the security policy templates and don't accept it blindly for your organization - because once you do, then it gets set in stone and that's what you are audited against (example - most modern operating systems have anti-malware built in, you don't need to waste money for purchasing a separate software, at least for year one - so make sure your policy doesn't say you have a separate end point protection solution running. Another one, if you have an office that you're using as a WeWork co-working space model only, most of the physical security controls like cameras, badge systems etc either don't apply or are the landlord's responsibility, so out of scope for you).
Hope this comment helps someone! SOC 2 is made out to be way more complicated (and expensive) than it actually needs to be.
Cosign all of this wholeheartedly. Push back!
The ratcheting back system scope thing is super good advice I always forget to give, too. You can get your entire software security program wrapped up in your SOC2 --- but why would you ever want to do that. The security of your software is very relevant to your customers, but it is not and should not be relevant to SOC2.
A point to add here on the scoping. This makes sense in a B2C world but for the B2B contracts, our customers specifically check that our scope clause includes all software systems that they are contracting for plus all the support systems that help make it, including your security program etc.
All our contracts are B2B, and B2B is where all my prior consulting experience was.
I am very fond of telling the story about the very significant security product company a colleague works at where they had a vendor that gave them a series of repeated Type 1s. I don't believe any of this matters.
I have also felt the need to claim to be “SOC 2 Certified”. It’s made hard by so many vendors using that language, that it’s come to be expected. Do I want to start the sales call by explaining that the purchaser is wrong… or just say yes, and if you sign this NDA you can have our auditors report.
From the article:
> SOC 2 is a security and compliance framework created by the AICPA
How is it that a group of accountants (the American Institute of Certified Public Accountants) was able to create a security framework for software, and position themselves as the sole gatekeeper who decides which auditors are allowed to certify SaaS vendors?
I’m surprised that companies would look to accountants, rather than people from the tech industry, to tell them whether a vendor has good IT security practices.
Yet the whole tech industry seems to be on board with this, even Google, Microsoft, etc. How did this come to be?
It's an audit standard about security. It's not a security standard. It defines a small number of extremely broad goals, like "you do risk management" and "you have access control mechanisms", which might be IT tools or might be a tabletop RPG.
You're irritated that people keep describing it at a security standard, which is understandable, but it isn't. AICPA auditors run SOC2 audits because SOC2 is an audit; it's about reconciling paperwork and evidence, about digesting policies and then checking that you actually do anything in those policies.
If you want to know about a firm's actual security program, you'll need to ask deeper questions than SOC2 can answer.
When I worked someplace undergoing a SOC2 audit I had to periodically jump into calls with our auditor and security architect to answer all sorts of highly-specific questions about how we deployed our software and the infrastructure that it ran on. At one point, for instance, the auditor told me that they needed me to demonstrate that our servers were all configured to synchronize their clocks to an NTP server. Kubernetes was a foreign concept to them and pointing to GKE docs wasn't sufficient - if memory serves I had to MacGyver some evidence together by hacking a worker node to be able to get a terminal on it and demonstrate that, yes, Google's managed VMs indeed run chronyd.
This seems to be the opposite of
> It's not a security standard. It defines a small number of extremely broad goals
Is this because of the specific auditors we were using? Are some more sympathetic than others to contemporary engineering practices?
Yes, and yes. No matter how good your auditors are, unless you're accepting a shrink-wrapped set of controls from a tool provider like Vanta, you need to be pushing back on things they demand; you just have to have a clear idea of what the Common Criteria control they're looking for is (you'll see this clearly from the DRL they give you at the start of the engagement), and then when they ask for stuff that doesn't matter or isn't relevant for your org, you explain how what they're asking for has nothing to do with the actual control you're working on.
So far as I can tell there is almost nothing that is a firm requirement in a standard SOC2 Security TSC audit. We even got "background checks" rolled back.
Our audit firm is a SOC2 practice that informally spun of out of a Big 4 firm. When people get audits after using GRC tools like Drata, they often get matchmade to auditors who bid down the cost of the audit. It's possible that one of the things you get when you pay low-mid 5 figures for an audit instead of low-mid 4 figures for an audit is a lot more flexibility and back/forth with the auditors; I don't know. If that's the case: pay for the better auditors. These are rounding error expenses compared to doing extra engineering work just for SOC2.
In my experience, it's more likely it was the approach of the folks at your company that made your controls.
SOC2 (and a bunch of similar regimes) basically boil down to "have you documented enough of your company's approach to things that would be damaging to business continuity, and can you demonstrate with evidence to auditors with low-to-medium technical expertise that you are doing what you've said you'd do". Some compliance regimes and some auditors care to differing degrees about whether you can demonstrate that what you've said you'd do is actually a viable and complete way to accomplish the goal you're addressing.
So the good path is that the compliance regime has some baseline expectation like "Audit logs exist for privileged access", and whoever at your company is writing the controls writes "All the logs get sent to our SIEM, and the SIEM tracks what time it received the logs, and the SIEM is only administered by the SIEM administration team" and makes a nice diagram and once a year they show somebody that logs make it to the SIEM.
One of the bad paths is that whoever is writing the controls writes "We have a custom set of k8s helm charts which coordinate using Raft consensus to capture and replicate log data". This gets you to the bad path where now you've got to prove to several non-technical people how all that works.
Another bad path is that whoever writes the control says "well shit, I guess technically if Jimbo on the IT team went nuts, he could push a malicious update to the SIEM and then log in and delete all the data", and so they invent some Rube Goldberg machine to make that not possible, making the infrastructure insanely more complex when they could have just said "Only the SIEM admins can admin the SIEM" and leaned on the fact that auditors expect management to make risk assessments.
The other bad path is that whoever is writing the controls doesn't realize they have agency in the matter, and so they just ask the auditors what the controls should be, and the auditors hand them some boilerplate about how all the servers in the server farm should run NTP and they should uninstall telnet and make sure that their LAMP stack is patched and whatever else, because the auditors are not generally highly technical. And the control author just runs with that and you end up with a control that was just "whatever junk the auditors have amalgamated from past audits" instead of being driven by your company's stack or needs.
Similarly, I've had many instances where an auditor would ask for X and instead of trying to show them X I would instead ask them what control / Common Criteria item they were trying to get assurance on. So much of the process is about educating the auditors about how your systems operate and how you manage risks, rather than just trying to provide or build anything and everything they ask for.
*X = password expiry configuration, server antivirus, approval emails, etc.
> "whatever junk the auditors have amalgamated from past audits"
At a large financial company, I was tasked with gathering some audit data to evidence that only certain people could access certain things. To do that, we had to get the list of users with access.
The access control tool at the time used plain text files. I sent the plaintext file with the list of names to the auditor. The auditor said that won't do, because it could have been forged. That's fair.
After lots and back and fourths, the solution was that I needed to send over a screenshot of a terminal window with a list of names, because that's what the auditor expected, and that's what had previously been submitted.
Not a screenshot of the actual document. Not a terminal showing the hostname or similar on the server. I had to get the textfile I'd sent, open it in vim, take a screenshot of vim, and submit that.
This is gold. The good-path bad-path thing is exactly the right way to think about it.
Most of the bad paths are usually taken by engineers with little or no experience being audited. After going through the ringer a few times (learn not to answer questions that aren't asked, or that they have a say in what that control should be) the pendulum swings in the other direction, where the answers are always good-path, not necessarily the real-path. At least until the practical part of the audit starts to look at what they really do, not what they say they do.
There's another giant pothole to navigate in many organizations, related to this:
> when they could have just said (...) and leaned on the fact that auditors expect management to make risk assessments
When management has decision paralysis and fear of accountability the engineers feel the need to compensate for the tight spot and solve problems the way they know how to solve them. With technical measures. And a technical measure that fixes the organizational problem tends to be complex and fidgety. Doubly hard for the auditors to properly take in.
“Management” here is a term of art. For many compliance regimes and controls, the engineer responsible for a system can make a statement as “management”.
> Kubernetes was a foreign concept to them and pointing to GKE docs wasn't sufficient
This doesn’t surprise me one bit, in my case our auditors didn’t have a clue what GitHub was and we had to explain how code reviews and deployment pipelines worked. And these are the people who are tasked with certifying whether we’re doing our job correctly.
Sure, maybe it’s because we didn’t pick good auditors. But the accountants certified those auditors, and the whole point of certification is that we can rely on it to establish basic knowledge.
You're relying on their ability to review documents and the meaningfulness of the reputation they stake on a signature saying they actually reviewed those documents. Nobody who has been through a SOC2 audit would ever reasonably think you're relying on your auditor's technology skills.
I've always viewed SOC-2 as a certification for business continuity, not security. Once you view it as making sure that the service can continue running, even with disaster or heavy turnover, it makes more sense.
Because CS refuses to formalize/unionize/license itself to its own detriment. There is no standard software developer. Accounts have some minimum bar to maintain their license. Who would you choose?