Tests whether free LLMs can accurately detect and mask Personally Identifiable Information in text, limited to four entity types: first name, last
name, email address, and IPv4 address.
Example:
Input: "Contact john.doe@example.com or reach out to John Doe at 192.168.1.1"
Output: "Contact [EMAIL] or reach out to [FIRSTNAME] [LASTNAME] at [IPV4]"
A practical privacy-focused benchmark — models must identify sensitive entities precisely without over-masking surrounding context.