How privacy-by-construction is built into Salary Confidential's surveys and results reports

Salary Confidential is unique in its approach to micro-data release. Yes, our surveys are very small, and yes, the owner of the surveys (the Requester, as we call them) controls the gate with their invitation tokens so they therefore know exactly who can be behind all possible data points. These constraints are what make the surveys highly relevant.

So you'd be right to assume that it's easier to re-identify who is behind a compensation record, given our constraints. But in this article, we explain the steps we take to keep your respondents anonymous.

We don’t rely on post-hoc anonymization of a completed dataset. Instead, privacy protections are built directly into how data is collected, structured, and released, so that identification risk is reduced before results ever exist.

Only a small number of fields are required, and none have high re-identification risks

The mandatory fields are limited to those necessary to form a minimum viable compensation record and enable basic statistical grouping. Fields with higher re-identification risk are optional.

Certain high-risk attributes are not collected at all

We do not collect job titles. In our testing, attempts to generalize or “lattice” titles into broader categories consistently failed to produce results that were both privacy-safe and practically useful in small peer groups.

Read about how k-anonymity research influenced the design of the Salary Confidential platform

Certain high-risk attributes are presented with a built-in blur, and, we sometimes don't show them at all if we detect an unsafe data scenario

We collect organization size, which we use in several different ways in our analysis. We also know that it's very relevant for a requester to be able to distinguish small organizations from large ones.

But a very precise number can have the potential to become a pseudo-identifier. So we coarsen it by turning the raw number we receive into a band size attribute, and we treat the boundaries of each band as probabilistic (meaning that company sizes that near the band edges may actually be banded higher or lower). Put another way: we represent an order of magnitude, because this is meaningful, but are deliberately not precise. We also randomize the presented order of organizations in a given band. Finally, in certain cases, we withhold showing organization size for a report when we detect an unsafe scenario with a result standing as too isolated from the other participants. We explain more, in details, how we handle company size in this FAQ item

Response data is kept structurally separate from access and identity

The redemption of a single-use cryptographic invitation tokens is not stored as an event connected to the submission of compensation data. The system is designed so that access control can function without creating any linkage between who was invited and what was submitted.

We treat Salary Confidential itself as a potential source of accidental tracing data — and design the system so that such data never exists in the first place.

In technical terms, we are intentionally “stateless” and “blind” in the minting and redemption of invitation tokens. Even in the event of a compromise by a determined requester, information about whether a specific token was redeemed or which response it produced cannot be obtained, because it was never recorded.

Read about our privacy-preserving invitation tokens

Operational metadata is not exposed

Submission timestamps, respondent IP addresses, and similar operational metadata never appear in released results, as these signals can enable inference even when response content itself is anonymous. We collect this metadata for fraud detection reasons, and no other purpose

Read about what anti-triangulation means and why it matters

Result release is hardened to limit membership inference

We also address the risk that submission timing itself can be used to infer whether a particular person responded. This is one reason why using tools like anonymous Google Forms for sensitive, small-n surveys is inherently weak: simple timing or traffic analysis can create near-certainty about who is behind a submission.

To mitigate this, Salary Confidential:

Releases survey results in batched groups, creating temporal and volumetric separation between submission and visibility.
Enforces a minimum survey size of four responses, which is why this is our minimum purchase size.
Allows early close only after this minimum threshold has been reached, preventing “trap” surveys designed to isolate a single respondent.
Read about our “safe batches of three” release method
Read about how we prevent early-closed surveys from creating a privacy failure

Freeform input is deliberately constrained

Uncontrolled text is a significant privacy risk. While respondents may add limited additional details, we do not allow arbitrary freeform questions, and we periodically review anonymized entries to identify structured questions that could be safely supported in the future.

We also present information from freeform fields differently than the rest of the compensation data (see below).

Higher-risk optional fields are gated with clear warnings

Fields such as location are optional, unselected by default in the base configurations that requesters use to build their polls, and presented to respondents with clear guidance explaining their identification risks. We add deliberate interface friction before respondents can answer these questions, to ensure the guidance is actually seen.

We also guide requesters not to collect these fields unless they are genuinely meaningful to them. Don’t collect what you can live without.

Our general approach is to rely on informed consent rather than automatic generalization or suppression, while designing the system to account for privacy risks most respondents would not reasonably anticipate.

Higher-risk fields are presented differently in final reports

Our final reports show each compensation submission as a cohesive record, because this view tells a more useful story than aggregating individual variables in isolation.

However, when combined with higher-risk fields that contain freeform or contextual data (such as location or 'additional details'), this presentation can enable triangulation. To mitigate this, we present higher-risk fields separately from the compensation records they came from.

For example, instead of joining location to a specific compensation entry, reports show: “Your respondents who shared location came from: [Location], [Location], [Location].”

We also randomize the order in which these higher-risk values are displayed relative to the order of compensation records. Even if all respondents answered a given optional question, the first location shown is unlikely to correspond to the first compensation record displayed.

In information systems terms, we deliberately create ambiguity in how the data is presented — even though the underlying data itself was actually precise.

Read about the content and shape of our final results report

Together, these constraints shift anonymity protection from a cleanup problem to a design problem — reducing the chance that identities can be reconstructed from small-n context, timing, or overly specific detail.

We may still have blind spots: If you think through an edge case we haven’t considered, we genuinely encourage you to reach out and discuss it with us.