Hacker Newsnew | past | comments | ask | show | jobs | submit | welcome_dragon's commentslogin

My brain immediately went to thinking this was about adding the Nashville predators hockey team to an old NHL game

I think if you take smokers and nicotine vapers combined, it has probably gone way up

Lemme know if you need any Altoids tins lol

Maybe we can form the Silicon Valley Un-American Activities Committee?

ADL would like to have a word with you….

NGL this sounds like pretty basic technology

Eh, joining these datasets can be challenging. Names can be spelled differently or changed, dates of birth can be off, people can share names and dates of births, addresses change and are can be expressed in multiple ways, databases may store names as a single string or separate fields, middle names may be missing or initials, databases might not share IDs etc. So it's kinda hard to do well although nothing really exciting technology wise.

This, incidentally, is why the "confidence score" is needed. And why the app frequently gets data (including citizenship) wrong.


When we stop tying our health insurance to our employment, we'll see a drastic uptick in people starting their own businesses. Working at company Z because their health insurance is fully paid for by the employer vs working at company Y where it costs you 1,400 a month for HDHP but the salaries are the same shouldn't be a thing

At least that's my theory.


Hey-o!


Reversible as in you can re-identify? That sounds not secure


The post discusses that:

Security First

Because the “PII Map” (the link between ID:1 and John Smith) effectively is the PII, we treat it as sensitive material.

The library includes a crypto module that forces AES-256-GCM encryption for the mapping table. The raw PII never leaves the local memory space, and the state object that persists between the masking and rehydration steps is encrypted at rest.

I've bookmarked this for inspiration for a medium/long term project I am considering building. I'd like to be able to take dumps of our production database and automatically (one way) anonymize it. Replacing all names with meaningless but semantically representative placeholders (gender matching where obvious - Alice, Bob, Mallory, Eve, Trent perhaps, and gender neutral like Jamie or Alex when suitable). Use similar techniques to rewrite email addresses ([email protected], [email protected], [email protected]) and addresses/placenames/whatever else can be pulled out with Named Entity Recognition. I suspect I'll in general be able to do a higher accuracy version of this, since I'll have an understanding of the database structure and we're already in the process of adding metadata about table and column data sensitivity. I will definitely be checking out the regexes and NER models used here.


That sounds interesting! I've been thinking about using representative placeholders as well, but while they have their strengths, there are also some downsides. We decided to go with an XML tag also because it clearly identifies the anonymized text as being anonymized (for humans) so mixups don't happen. After reading your comment I think it would also be really interesting to be able to add custom metadata to the tags. Like if you have a username that you want to anonymize, but your database has additional (deterministic) information like the gender, we should add a callback for you as the user to add this information to the tag.


My hope is it means it assigns coded identifiers and the key remains local. When the document returns, the identifiers can be restored. So the PII itself never leaves the premises.


that's exactly right. PII stays local (and the PII-Tag-Map is encrypted)


I dunno. Just use a different license then? You can't have free software and then limit what people can do with it


Why not? The containment strategy worked on imperial systems before?


This is finally a terminal that gets panels and tabs right. Gonna try it without tmux


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: