Protecting Sensitive YAML with Piiano Vault

Stefanie Lai
Level Up Coding
Published in
7 min readSep 14, 2023

--

from Unsplash

It is more and more popular to define settings for applications, microservices, or Kubernetes resources with configuration files like YAML (Yet Another Markup Language). Although it is convenient, it can also put companies at huge risk if sensitive information like database passwords and API keys the YAML files contain is leaked.

Although the RBAC offers protection to some extent, it is sometimes error-prone, and will even limit the functions of YAML in the following ways:

  • Not field-level encryption supportive: RBAC can control access to k8s resources, but not for encryption of specific fields inside the resource. For example, in a Kubernetes cluster, using ConfigMap or Secret cannot prevent users from accessing some fields.
  • Limited in custom CRD: RBAC cannot flexibly handle field protection for the CRDs you use to custom resources, and it is always complex and error-prone if you manually extend RBAC’s functions.
  • Complex in permission management: As applications or services grow, RBAC rules and roles turn complex and hard to manage, and data leakage can be caused by misconfigured RBAC rules.
  • Configuration drift: It is always a challenge to maintain consistent RBAC configurations across multiple environments, especially across team or multi-tenant environments.
  • Hard for dynamic field encryption: RBAC doesn’t work when an application needs to dynamically encrypt or decrypt configuration data at runtime.

There are approaches to address the above issues. The most common solution is to directly process the corresponding sensitive data fields in the YAML file, which is simple, effective, and flexible in field-level protection of custom CRDs.

YAML Programmatic Processing

Processing YAML content and extracting some fields are no big challenges, Processing YAML content and extracting some fields are no big challenges, however, distinguishing sensitive data fields and taking appropriate protection is the real test for developers.

Different protection levels apply to different sensitive data, for example:

  • Data like ID and bank account number needs to be fully encrypted and inaccessible to most users.
  • Data such as passwords needs only one-way encryption, which can be implemented by hashing directly.
  • Data like email or phone number only needs to be partially masked to let users know the data type, but not all the information.

Based on different protection requirements, corresponding encoding methods can be applied.

Encryption and Decryption

We choose an appropriate encryption algorithm for different data types, such as Advanced Encryption Standard (AES) or Rivest–Shamir–Adleman (RSA), which are different in encryption and decryption speed and require different key management methods.

Besides, some databases also provide relevant encryption methods, which can be used for secondary data encryption. Of course, we need to back up the data simultaneously when using these methods for security.

Hashing

Hashing is simpler because most languages have provided relevant built-in APIs for quick implementation, such as the security package in Java, hashlib of Python, etc. But for high-level protection, we will salt the hashing, or hash multiple times and then some third-party libraries need to be introduced.

Masking

Masking is often implemented by performing different string operations for different protection needs. For example, string replaces the digits to be masked in email; the phone number displays only the last 4 digits. Coding is not difficult but often cumbersome and prone to bugs.

The above data protection methods not only require knowledge but also a lot of experience, and they all have their pros and cons.

Protecting Sensitive YAML

YAML files are integral to almost all aspects of Kubernetes usage, and safeguarding sensitive data within these files has become increasingly critical as Kubernetes gains popularity. In cases where the community has not yet established specific guidelines for securing fields in YAML, crafting your own solutions can be exceedingly complex, demanding extensive security expertise, and prone to errors. In such situations, relying on dedicated security tools becomes imperative to achieve our objectives.

I opted for Piiano Vault for safeguarding sensitive YAML files due to its comprehensive coverage. It comes with all the necessary functionalities and has a detailed API and HTTP interface, making integration with various modules seamless. Despite being a relatively new tool, its documentation is quite extensive, and it benefits from robust support within the Slack community group. Finally, it’s free!

from https://docs.piiano.com/introduction

Piiano Vault Installation

Installing Piiano Vault requires very simple steps, including installing vault-server, vault-cli, and verification. Make sure you have the latest Docker installed locally.

# Register and Get a key
# Start a Vault server
docker run --rm --init -d \
--name pvault-dev \
-p 8123:8123 \
-e PVAULT_SERVICE_LICENSE=${key}
piiano/pvault-dev:1.8.2
# Install CLI
brew install piiano/tap/pvault-cli
# Set up CLI command
alias pvault="docker run --rm -i --add-host='host.docker.internal:host-gateway' -v $(pwd):/pwd -w /pwd piiano/pvault-cli:1.8.2"
# Test
pvault status

Now we can see the success status of Vault server.

Code in Typescript

Typescript is one of the four languages supported by Piiano Vault. It is very commonly used for handling YAML and is often more lightweight and faster than languages like Java and Python. Below I will demonstrate how to protect the sensitive data with the Typescript API of Piiano Vault on User info.

Suppose the user information in our YAML is as follows.

users:
- username: "john_doe"
ssn: "123-45-6789" # Social Security Number
email: "john.doe@example.com"
bankId: "BANK123456" # Bank Identifier
phoneNumber: "123-456-7890"
password: "supersecret123"

This User can also be data directly contained in some CRD and stored in the cloud, so it needs to be secured. Among them, password can be hashed directly, while bankid needs to be encrypted, and email needs masking. Keep in mind that this is a demo, and we can never save passwords in plain text in files.

The first step is to install the Piiano Vault library.

npm install @piiano/vault-client

This library contains all the APIs we need to protect sensitive data.

# token ops
VaultClient.tokens.tokenize()
VaultClient.tokens.detokenize()
# objects ops
VaultClient.objects.listObjects()
VaultClient.objects.addObject()
VaultClient.objects.getObjectById()
VaultClient.objects.updateObjectById()
VaultClient.objects.deleteObjectById()
VaultClient.objects.addObjects()
VaultClient.objects.updateObjects()
VaultClient.objects.deleteObjects()
VaultClient.objects.searchObjects()
# encrypt and decrypt
VaultClient.crypto.encrypt()
VaultClient.crypto.updateEncrypted()
VaultClient.crypto.decrypt()
# hash
VaultClient.crypto.hashObjects()

In Vault, each data set is a collection, then we need to create the corresponding collection first.

  const collection = await createCollection(vaultClient.collections, {
name: collectionName,
type: 'PERSONS',
properties: [
{ name: "ssn", data_type_name: "SSN", is_nullable: true, description: "Social Security Number" },
{ name: "bankid", data_type_name: "BAN", is_nullable: true },
{ name: "email", data_type_name: "EMAIL" },
{ name: "phoneNumber", data_type_name: "PHONE_NUMBER"},
{ name: "password", data_type_name: "STRING", is_nullable: true}
],
});

The data_type_name here is very important, because many sensitive field types are built-in, and you can enjoy their corresponding transformation functions for free. The email type will be directly masked and displayed as j******** *@gmail.com; BAN is the bank card number type, and displays only the last 4 digits by default. As seen, no additional encryption or hashing operations are required if the appropriate type is chosen.

All built-in data types are listed in this documentation, while the default built-in transformations are here. We can also customize the type.

Then we can add the test data to the Vault directly.

const objects: User[] = [
{
ssn: "123-12-1234",
bankid: "1234455",
email: "john@somemail.com",
phoneNumber: "+461234564444",
password: "123445"
},
{
ssn: "123-12-1235",
bankid: "12343556",
email: "mary@somemail.com",
phoneNumber: "+462345675555",
password: "234464"
},
];

const users = await addUsersToCollection(vaultClient.objects, collectionName, objects);

Then we tokenize the password directly.

async function tokenizeObject(
tokensClient: TokensClient,
collection: string,
object: UserResult,
tokenRequest: TokenizeRequest,
searchTokenRequest: QueryToken
) {
console.log(`Tokenizing object with ID ${object.id} in collection ${collection}...`);
// Tokenize object
const tokens = await tokensClient.tokenize({
collection, reason,
requestBody: [tokenRequest]
});

// Search for token
const searchTokens = await tokensClient.searchTokens({
collection, reason,
requestBody: searchTokenRequest
});

if (!searchTokens[0])
throw new Error(`Failed to find a token for the object with ID ${object.id}.`);
console.log(
`Found token with ID ${searchTokens[0].token_id} in collection ${collection}.`
);

return tokens[0].token_id;
}

const tokenRequest: TokenizeRequest = {
type: 'pointer',
props: ["password"],
object: {
id: users[0].id!,
},
};

const searchTokenRequest: QueryToken = {
object_ids: [users[0].id!],
};

const tokenId = await tokenizeObject(
vaultClient.tokens,
collectionName,
users[0] as UserResult,
tokenRequest,
searchTokenRequest
);

console.log(tokenId);

Tokenizing requires only the corresponding request object to pass in the relevant fields. See the output below.

The last step is to read the mask data provided by default by Vault, including email, ban, etc.

const objectsWithMasks = await objectsClient.listObjects({
collection: "users",
reason,
ids: [object.id!],
props: collection.properties
.filter(item => item.name !== "password")
.map(item => `${item.name}.mask`)
});
if (objectsWithMasks.results.length === 0)
throw new Error(`Failed to query object-${object.id} in the collection.`);

console.log(
`Found the following objects in collection ${collection.name}:${objectsWithMasks.results.map(
item => `\n\t${JSON.stringify(item)}`
)}`

As expected, the four fields email, phoneNumber, ssn, and bankid are under protection now.

Conclusion

While RBAC offers some level of protection, it falls short in guarding the nuanced needs of YAML-based configurations, particularly in terms of field-level encryption, hashing, and masking. Implementing such a feature-rich, custom solution demands a deep understanding of various encryption algorithms and techniques. With comprehensive TypeScript examples, we saw that Piiano Vault makes YAML security easily attainable and feasible.

Thanks for reading!

--

--