Reveal your mysteries without actually disclosing them.

The title of this article might seem like a meme caption. However, it refers to a real challenge that the engineers at GitGuardian faced while developing the mechanisms for their new service, HasMySecretLeaked.

Nov 24, 2023 - 13:00
  Source
 0  33
Reveal your mysteries without actually disclosing them.

The concept might seem like something out of a meme, but GitGuardian's engineers faced a real challenge: creating their HasMySecretLeaked service to help developers identify if their confidential data, like passwords and API keys, ended up in public GitHub repositories without exposing their sensitive details. This article delves into their inventive solution.

To put things in perspective, imagine a bit's weight equalling an electron's mass. In such a scenario, a ton of data would be approximately 121.9 quadrillion petabytes, or about $39.2 trillion in MacBook Pro storage upgrades - far more than all the world's money. So, when we say GitGuardian scanned a "ton" of GitHub public data, it's a metaphor for a vast amount, not a literal ton.

Their extensive scan of GitHub's public commits and gists unearthed a staggering number of secrets: millions of them, to be exact, including passwords and cryptographic certificates. To help users verify if their confidential data was part of this vast find without publicly releasing these secrets, GitGuardian turned to a technique called 'fingerprinting'.

This method involves encrypting and hashing the secret, then sharing only a portion of this hash with GitGuardian. This approach narrows down potential matches significantly without revealing enough information to decrypt the hash. To bolster security, the encryption and hashing process is executed client-side.

For users of the HasMySecretLeaked web interface, a Python script is provided to generate the hash locally. This process ensures the actual secret is never exposed outside of the user's terminal session. Users can even verify the data sent via the browser's developer tools to ensure no sensitive information is transmitted.

Similarly, users of the open-source ggshield CLI can inspect the code to understand the workings of the hmsl command. For further assurance, tools like Fiddler or Wireshark can be used to monitor data transmission.

Understanding the hesitancy users might have in pasting a secret into a web page, GitGuardian emphasized transparency and user control in their process. This approach is evident not just in their marketing but also in the ggshield documentation for the hsml command.

GitGuardian's efforts in ensuring a secure way for users to check if their secrets were leaked have been fruitful. In the initial weeks of its launch, over 9,000 secrets were checked. The service allows users to check up to five secrets daily for free, a vital tool considering the potential risks of exposed secrets. This innovative approach by GitGuardian not only serves its immediate purpose but also provides a blueprint for others in handling sensitive customer information securely.