HIPAA-compliant PII/PHI anonymization library for healthcare data
HIPAApotamus is a Node.js library that implements HIPAA Safe Harbor de-identification requirements (§164.514(b)(2)) to anonymize personally identifiable information (PII) and protected health information (PHI) in healthcare data.
- ✅ HIPAA Safe Harbor Compliant - Implements all 18 HIPAA identifier requirements
- 🔒 Deterministic Aliases - Same input always produces same output (maintains referential integrity)
- 🎯 Field-based Detection - Automatically detects and anonymizes PII/PHI field patterns
- 📝 Content Scanning - Optional scanning of string content for embedded PII patterns
- ⚙️ Highly Configurable - Custom fields, salt support, hide/show redacted fields
- 🚀 Zero Dependencies - Only uses Node.js crypto (built-in)
- 📦 TypeScript Support - Full type definitions included
npm install hipaapotamusimport { anonymizeData, createConfig } from 'hipaapotamus';
// Create configuration
const config = createConfig({
enabled: true,
scanContent: true, // Enable content scanning for embedded PII
});
// Your healthcare data
const patientData = {
first_name: 'John',
last_name: 'Doe',
dob: '1980-05-15',
ssn: '123-45-6789',
email: 'john.doe@example.com',
phone: '555-123-4567',
address: '123 Main St',
city: 'Springfield',
zip_code: '62701',
diagnosis: 'Type 2 Diabetes',
notes: 'Patient contacted at john.doe@example.com',
};
// Anonymize the data
const anonymized = anonymizeData(patientData, config);
console.log(anonymized);
// {
// first_name: 'Swift',
// last_name: 'Falcon',
// dob: '1980',
// ssn: '[REDACTED-SSN]',
// email: '[REDACTED-EMAIL]',
// phone: '[REDACTED-PHONE]',
// address: '[REDACTED-ADDRESS]',
// city: '[REDACTED-ADDRESS]',
// zip_code: '627',
// diagnosis: 'Type 2 Diabetes',
// notes: 'Patient contacted at [EMAIL-REDACTED]'
// }HIPAApotamus implements the HIPAA Safe Harbor de-identification method as defined in 45 CFR §164.514(b)(2), which requires the removal of 18 specific types of identifiers. This is the most commonly used de-identification method because it provides a clear, prescriptive approach to protecting patient privacy.
The Safe Harbor method is one of two approaches for de-identifying PHI under HIPAA. It provides a "safe harbor" from re-identification liability when all 18 identifier categories are removed. This method is preferred by many healthcare organizations because:
- ✅ Clear requirements - Specific list of identifiers to remove
- ✅ No statistical analysis required - Unlike the Expert Determination method
- ✅ Legal safe harbor - Provides regulatory protection when properly implemented
- ✅ Industry standard - Widely accepted approach for data sharing
Implementation: Replaced with deterministic aliases using SHA-256 hashing
// Example: "John Doe" → "Swift Falcon"
// Same input always produces same output for referential integrityField patterns detected: first_name, last_name, middle_name, full_name, patient_name, maiden_name, etc.
Implementation: HIPAA-compliant ZIP code processing with restricted prefix handling
- ZIP codes with populations > 20,000: Truncated to first 3 digits (e.g.,
62701→627) - Restricted ZIP codes (≤20,000 population): Changed to
000per §164.514(b)(2)(i)(B) - 17 Restricted ZIP prefixes (based on 2000 Census):
036,059,063,102,203,556,692,790821,823,830,831,878,879,884,890,893
// Examples:
'62701' → '627' // Normal ZIP (population > 20,000)
'05602' → '000' // Restricted ZIP (population ≤ 20,000)
'123 Main St' → '[REDACTED-ADDRESS]' // Street addresses
'Springfield' → '[REDACTED-ADDRESS]' // CitiesField patterns detected: zip_code, postal_code, address, street, city, latitude, longitude
Implementation: Intelligent date handling with age-based adjustments
- Most dates: Only year retained (e.g.,
1980-05-15→1980) - Ages > 89: Aggregated per §164.514(b)(2)(i)(C) (e.g., DOB:
1925-03-10→≤1935) - Test/lab dates: Completely removed →
[REDACTED-TEST-DATE]
// Examples:
dob: '1980-05-15' → '1980'
dob: '1920-01-01' → '≤1935' // Patient would be >89 years old
test_date: '2024-01-15' → '[REDACTED-TEST-DATE]'Field patterns detected: dob, date_of_birth, admission_date, discharge_date, death_date, test_date, lab_date
Implementation: All phone/fax numbers redacted
'555-123-4567' → '[REDACTED-PHONE]'Field patterns detected: phone, telephone, mobile, cell, fax
Implementation: Combined with telephone redaction
'555-987-6543' → '[REDACTED-PHONE]'Implementation: Complete email redaction
'john.doe@example.com' → '[REDACTED-EMAIL]'Field patterns detected: email, email_address
Implementation: Complete SSN redaction
'123-45-6789' → '[REDACTED-SSN]'Field patterns detected: ssn, social_security, social_security_number
Implementation: Complete MRN redaction
'MRN-12345' → '[REDACTED]'Field patterns detected: medical_record_number, mrn, patient_number
Implementation: Complete ID redaction
'MB-987654' → '[REDACTED]'Field patterns detected: member_id, subscriber_id, health_plan_beneficiary_number
Implementation: Complete account number redaction
'ACCT-54321' → '[REDACTED]'Field patterns detected: account_number, account_id
Implementation: Complete license redaction
'DL-A12345678' → '[REDACTED]'Field patterns detected: license, license_number, drivers_license, certificate_number, passport, passport_number
Implementation: Complete vehicle/serial redaction
'VIN-1HGCM82633A123456' → '[REDACTED]'Field patterns detected: vehicle_identifier, serial_number, license_plate
Implementation: Complete device ID redaction
'DEVICE-ABC123XYZ' → '[REDACTED]'Field patterns detected: device_identifier, device_serial
Implementation: Complete URL redaction
'https://example.com/patient' → '[REDACTED]'Field patterns detected: url, website
Implementation: Complete IP address redaction
'192.168.1.100' → '[REDACTED]'Field patterns detected: ip_address, ip
Implementation: Complete biometric redaction
fingerprint: 'base64data...' → '[REDACTED]'Field patterns detected: biometric, fingerprint, face, retina, iris
Implementation: Complete image redaction
photo: 'image_data' → '[REDACTED]'Field patterns detected: photo, image, picture
Implementation: Extensible custom field support
const config = createConfig({
customFields: ['internal_id', 'legacy_*'] // Add your own identifiers
});When scanContent: true is enabled, HIPAApotamus scans string content for embedded PII patterns:
const notes = 'Patient John D. contacted at john@example.com from 555-123-4567';
// With scanContent enabled:
// → 'Patient [NAME-REDACTED] contacted at [EMAIL-REDACTED] from [PHONE-REDACTED]'Patterns detected in content:
- Name + Initial (e.g., "John D.")
- SSN (all formats:
123-45-6789,123.45.6789,123 45 6789) - Phone numbers (US formats with optional country code)
- Email addresses
- URLs (http/https)
- IP addresses (IPv4)
- ZIP codes (HIPAA-compliant processing)
- Medical record numbers (MRN formats)
- Account/Member IDs
HIPAApotamus uses SHA-256 hashing for deterministic alias generation, with optional salt support for enhanced security:
const config = createConfig({
hashSalt: 'your-secret-salt-value'
});Benefits:
- Deterministic: Same input always produces same output
- Referential integrity: Relationships maintained across datasets
- One-way: Cannot reverse aliases back to original names
- Cryptographically secure: SHA-256 is NIST-approved
To verify HIPAA Safe Harbor compliance in your implementation:
- ✅ All 18 identifier categories removed
- ✅ No residual information that could identify individuals
- ✅ No actual knowledge that remaining data could re-identify individuals
- ✅ Documented de-identification process (this README serves as documentation)
interface AnonymizationConfig {
enabled: boolean; // Enable/disable anonymization
customFields: string[]; // Additional custom fields to anonymize
scanContent?: boolean; // Scan string content for PII patterns
hideRedacted?: boolean; // Completely omit redacted fields from output
hashSalt?: string; // Salt for enhanced cryptographic security
}const config = createConfig({
enabled: true,
customFields: ['internal_id', 'legacy_*'], // Wildcards supported
});
const data = {
internal_id: 'SECRET123',
legacy_field_1: 'old-data',
public_field: 'safe-data',
};
const result = anonymizeData(data, config);
// internal_id and legacy_field_1 will be anonymized
// public_field remains unchangedconst config = createConfig({
enabled: true,
hideRedacted: true, // Completely remove PII fields
});
const data = {
first_name: 'John',
diagnosis: 'Diabetes',
};
const result = anonymizeData(data, config);
// { diagnosis: 'Diabetes' }
// first_name is completely removedconst config = createConfig({
enabled: true,
scanContent: true, // Enable content scanning
});
const data = {
notes: 'Patient John D. called from 555-123-4567 about test results.',
diagnosis: 'Type 2 Diabetes',
};
const result = anonymizeData(data, config);
// {
// notes: 'Patient [NAME-REDACTED] called from [PHONE-REDACTED] about test results.',
// diagnosis: 'Type 2 Diabetes'
// }const config = createConfig({
enabled: true,
hashSalt: 'your-secret-salt-value', // Add salt for enhanced security
});
// Same names will produce different aliases with different saltsimport { scanAndRedactContent } from 'hipaapotamus';
const text = 'Contact John D. at john@example.com or 555-123-4567';
const redacted = scanAndRedactContent(text);
// 'Contact [NAME-REDACTED] at [EMAIL-REDACTED] or [PHONE-REDACTED]'const data = {
patients: [
{ first_name: 'John', diagnosis: 'Diabetes' },
{ first_name: 'Jane', diagnosis: 'Hypertension' },
],
metadata: {
total: 2,
timestamp: '2024-01-01',
},
};
const result = anonymizeData(data, config);
// Recursively processes all nested structuresHIPAApotamus is smart about not anonymizing non-PII fields that happen to contain keywords:
const data = {
state: 'FL', // NOT anonymized
status: 'active', // NOT anonymized
payment_method: 'Insurance', // NOT anonymized
diagnosis_code: 'E11.9', // NOT anonymized
patient_name: 'John Doe', // ANONYMIZED
};HIPAApotamus preserves clinical and analytical data:
- ✅ Diagnoses and conditions
- ✅ Treatment information
- ✅ Medications
- ✅ Lab values and results
- ✅ Visit counts and statistics
- ✅ Metadata (pagination, timestamps, etc.)
- ✅ Clinical codes (ICD, CPT, etc.)
Recursively anonymizes PII/PHI fields in an object or array.
Parameters:
data- The data to anonymize (object, array, or primitive)config- Anonymization configuration
Returns: Anonymized data with same structure as input
Scans and redacts PII/PHI patterns within string content.
Parameters:
content- String content to scan
Returns: Content with PII/PHI patterns redacted
Creates an anonymization configuration with optional overrides.
Parameters:
overrides- Optional partial configuration to override defaults
Returns: Complete AnonymizationConfig object
Full TypeScript definitions are included:
import type { AnonymizationConfig } from 'hipaapotamus';
const config: AnonymizationConfig = {
enabled: true,
customFields: [],
scanContent: true,
hideRedacted: false,
hashSalt: 'optional-salt',
};npm test # Run tests
npm run test:watch # Watch mode
npm run test:coverage # Coverage reportMIT
Contributions are welcome! Please feel free to submit a Pull Request.
If you discover a security vulnerability, please email security@example.com.
This library implements HIPAA Safe Harbor requirements to the best of our knowledge. However, HIPAA compliance involves many factors beyond technical de-identification. Consult with legal and compliance experts to ensure your complete HIPAA compliance program.
Made with 🦛 by the HIPAApotamus team