Skip to Content
Workflows

Workflows

Expunct supports several workflow patterns depending on your use case. This page covers the five most common patterns with code examples.

Text redaction

The simplest workflow. Send text, get redacted text back synchronously. Best for inline or real-time redaction.

from expunct import Expunct client = Expunct(api_key="pk_live_...") result = client.redact.text( text="Contact Jane Doe at jane.doe@acme.com or 555-0123", ) print(result.redacted_text) # "Contact [PERSON] at [EMAIL_ADDRESS] or [PHONE_NUMBER]" for finding in result.findings: print(f"{finding.entity_type}: {finding.text} (score: {finding.score})")

File redaction

For documents, images, video, and audio. File redaction is asynchronous — you submit a file, then poll for the result.

Supported file types

FormatExtensionsTypical completion time
DocumentsPDF, DOCX5-30 seconds
ImagesPNG, JPG3-15 seconds
VideoMP430-300 seconds
AudioWAV, MP315-120 seconds

Submit and poll

import time from expunct import Expunct client = Expunct(api_key="pk_live_...") # Submit a file job = client.redact.file(file_path="/path/to/document.pdf") print(f"Job submitted: {job.job_id}") # Poll for completion while True: status = client.jobs.get(job.job_id) print(f"Status: {status.status}") if status.status == "completed": print(f"Redacted file: {status.output_uri}") break elif status.status in ("failed", "error"): print(f"Job failed: {status.error}") break time.sleep(2)

Batch redaction

Process multiple files at once using cloud URIs. A batch can contain between 1 and 100 URIs. Each URI is processed as an individual job.

from expunct import Expunct client = Expunct(api_key="pk_live_...") batch = client.batch.create( uris=[ "s3://my-bucket/reports/q1-report.pdf", "s3://my-bucket/reports/q2-report.pdf", "s3://my-bucket/recordings/meeting-2024-03.mp4", ], output_prefix="s3://my-bucket/redacted/", ) print(f"Batch ID: {batch.batch_id}") print(f"Jobs created: {batch.total_jobs}") # Check batch progress status = client.batch.get(batch.batch_id) print(f"Completed: {status.completed_jobs}/{status.total_jobs}")

Policy-based redaction

Policies let you save reusable redaction configurations. A policy defines which entity types to detect and what action to take for each one (redact, mask, pseudonymize, or allow).

Create a policy

from expunct import Expunct client = Expunct(api_key="pk_live_...") policy = client.policies.create( name="customer-support", description="Redact PII in customer support transcripts", entity_actions={ "PERSON": "pseudonymize", "EMAIL_ADDRESS": "redact", "PHONE_NUMBER": "mask", "CREDIT_CARD": "redact", "LOCATION": "allow", }, ) print(f"Policy ID: {policy.policy_id}")

Use a policy for redaction

result = client.redact.text( text="Jane Doe called from 555-0123 about order #789", policy_id=policy.policy_id, ) print(result.redacted_text) # "Alice Johnson called from ***-**** about order #789"

Multi-language redaction

Expunct supports detection in multiple languages. Specify the language parameter to optimize detection for a particular language.

Currently supported languages:

  • en — English (default)
  • es — Spanish
from expunct import Expunct client = Expunct(api_key="pk_live_...") # Spanish text result = client.redact.text( text="El paciente Juan Garcia, DNI 12345678A, vive en Madrid", language="es", ) print(result.redacted_text) # "El paciente [PERSON], DNI [US_SSN], vive en [LOCATION]"

The language parameter can also be used with file and batch redaction. If omitted, the default language is English.