How to Safeguard Your API Keys and Prevent Data Disclosure

July 1, 2023

Sensitive information is always at risk of exposure, which can lead to severe consequences. This article will explore the potential risks of exposing API keys and other crucial data. We'll use a different API service as an example to illustrate the concepts effectively.

Understanding API Data Exposure

Imagine you're developing a script that utilizes an API to fetch valuable data. For instance, let's consider a script that retrieves stock market data from the Alpha Vantage API:

GET "https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=MSFT&apikey={YOUR API KEY}"

It's not uncommon for developers to store their API keys (a secret) directly in the script file for quick and convenient testing. Here's a script example using Python and a fictional API key:

# app.py

api_key = "12abc3d45ef6789012345g6789h0ij12"

function = "TIME_SERIES_DAILY"

symbol = "MSFT"

base_url = "https://www.alphavantage.co/query?"

final_url = base_url + "function=" + function + "&symbol=" + symbol + "&apikey=" + api_key

While this approach may seem practical, it poses a significant security risk. When code like this is pushed to a public GitHub repository, it exposes the secret API key. Consequently, this private access token should never be visible to anyone outside the privileged users within your organization.

Additionally, several other issues arise from this exposure:

1. License Agreement Breach: Sharing the stolen key may breach your license agreement with the API vendor.

2. API Request Throttling: Adversaries can exploit the stolen key to make excessive API calls beyond the permitted scope of your license, causing the API vendor to throttle your requests.

3. Potential Security Breach: If the exposed private API key grants access to sensitive data, such as a cloud storage account, it can invite a severe security breach.

Securing Sensitive Data

So, how can we effectively shield API keys and other sensitive data in languages like Python when collaborating on GitHub? The solution is straightforward. We need to establish a configuration file to store the required code for accessing our API keys (and other sensitive information). This file should be included in our code files when necessary but excluded from version control using a `.gitignore` file. This way, the application can function as expected while ensuring that sensitive credentials remain safe, preventing their exposure on GitHub.

Continuing our previous example, we'll create a separate code file named `config.py` to store our API key. Then, we modify the relevant lines in `app.py` to fetch the API key from `config.py,` as indicated in the modified code below:

# config.py

api_key = "12abc3d45ef6789012345g6789h0ij12"

# app.py

import config

function = "TIME_SERIES_DAILY"

symbol = "MSFT"

base_url = "https://www.alphavantage.co/query?"

final_url = base_url + "function=" + function + "&symbol=" + symbol + "&apikey=" + config.api_key

To ensure `config.py` doesn't end up on GitHub, we need to add it to the `.gitignore` file:

# .gitignore

config.py

Only `app.py` and `.gitignore` will be uploaded to the public repository when pushing to GitHub. Consequently, the sensitive information in `config.py` will remain confidential and secure.

Removing Previously Pushed Sensitive Data

If you have unintentionally pushed sensitive data to a GitHub repository in the past, it's crucial to remove it immediately. GitHub provides a guide on how to remove such data effectively. Please refer to that guide for instructions on rectifying the issue.

Conclusion

Although it may appear simple, this practice is a powerful approach when handling credentials. Best practices dictate that secure credentials should not be directly assigned to variables unless the file is excluded from version control. Depending on your organization's standards, applying this methodology to self-hosted version control may also be beneficial. Regardless of where version control is hosted (local or cloud-based), it's essential to know precisely where sensitive credentials are stored and avoid pushing them to version control. By following these steps, you can safeguard your API keys and prevent the inadvertent disclosure of sensitive data.