Hacker Newsnew | past | comments | ask | show | jobs | submit | Wouter33's commentslogin

Nice implementation! Just a heads-up, hashing the ip like that is still considered tracking under GDPR and requires a privacy banner in the EU.


Probably should be "salted hashes might be considered PII". It has not be tried by the EU court and the law is not 100% clear. It might be. It might not be.


Correct. This is a flawed hashing implementation as it allows for re-identification.

Having that IP and user timezone you can generate the same hash and trace back the user. This is hardly anonymous hashing.

Wide Angle Analytics adds daily, transient salt to each IP hash which is never logged thus generating a truly anonymous hash that prevents reidentification.


What if my hashing function is really destructive and has high likelihood of collisions?


> What if my hashing function is really destructive and has high likelihood of collisions?

If it’s so destructive that it’s impossible to track users, it’s useless for you. If not, you need a privacy banner.


A high collision hash would be useful for me on my low traffic page and I’d enjoy not having to display a cookie banner.

Also: https://news.ycombinator.com/item?id=38096235


Can you explain why or link a source? I’d like to learn the details.


Likely because the hash of an IP can easily be reversed as there are only ~2^32 IPv4 addresses.


It is not just that. Having user IP and such a hashing approach you can re-identify past sessions.


What if my hashing function has high likelihood of collisions?


Then you cannot trust the analytics


You can estimate the actual numbers based on the collision rate.

Analytics is not about absolute accuracy, it's about measuring differences; things like which pages are most popular, did traffic grow when you ran a PR campaign etc.


Do you trust analytics that doesn’t use JS? Or relies on mobile users to scroll the page before counting a hit?

It’s all a heuristic and even with high collision hashing, analytics would provide some additional insight.


https://gdpr-info.eu/art-4-gdpr/ paragraph 1:

> ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;


This does not reference hashing, which can be an irreversible and destructive operation. As such, it can remove the “relating” part - i.e. you’ll no longer be able to use the information to relate it to an identifiable natural person.

In this context, if I define a hashing function that e.g. sums all ip address octets, what then?


A hash (whether MD5 or some SHA) on IP4-address is easily reversed.

Summing octets is non-reversable, so it seems like a good 'hash' to me (but note: you'll get a lot of collisions). And of course, IANAL.


I was answering your request for a source.

The linked article talks about identification numbers that can be used to link a person. I am not a lawyer but the article specifically refers to one person.

By that logic, if the hash you generate cannot be linked to exactly one, specific person/request - you’re in the clear. I think ;)


If the data gets stored in this way (hash of IP[0]) for a long time I'm with you. But if you only store the data for 24 hours it might still count as temporary storage and should be "anonymized" enough.

IMO (and I'm not a lawyer): if you store ip+site for 24 hours and after that only store "region" (maybe country or state) and site this should be GDPR compliant.

[0] it should use sha256 or similar and not md5


Actually no. It’s very likely this is fine. Context is important.

Not a layer but discussed this previously with lawyers when building a GDPR framework awhile back.


Context is irrelevant. What is relevant is whether a value, for example a hash, can be identified to a specific person in some way.


I'm really not going to argue here.

I've been told this directly by lawyers who specialize in GDPR and CCPA etc. I will take their word over yours.

If you are a lawyer with direct expertise in this area then I'm willing to listen.


The GDPR is very clear here (https://gdpr-info.eu/art-4-gdpr/). So you must have misunderstood the lawyers you talked to or you are referring about a hash that cannot identify a person. If information can be linked to a person it is considered PII of that person.


No bad intentions with the title, was just using "banned" since Valve used it in their response to the Reddit poster. Change it to whatever you think is better! :)


I haven't followed any of the details so you could well be right (in which case, sorry!)


Valve is quite clear about their reasoning. Since AI models use all kind of sources for their training, they don't want those assets on their platform because they are afraid of copyright claims.


For all we know, the game in question had images clearly aping some licensed characters. We don't know how stringent the policies are without clarification or examples of art found infringing. How did Valve know that the art was AI-generated? Did the developer tell them or include it in their marketing materials? It's basically just reading tea leaves without that information.


It actually sounds like if you claim to have ownership of the training data you can still use AI generated assets. For most people this is a distinction without a difference however.


That seems very sensible to me, I hope other platforms follow this example



PHP is just very simple and powerful. Check out levels making 100k per month with a single php file: https://twitter.com/levelsio/status/1381709793769979906?s=21


Here is a video explaining his setup: https://www.youtube.com/watch?v=rE8rgFAOXgY


I think the fan is for the G-Sync module!


Lot's of people are using restream.io. Good solution to stream to multiple places at the same time.

Doing you own single-tenant stream is hard to manage and scale in general.


Same experience here. He didn't listen to my feedback and didn't answer any of the Workers-related question i emailed him on his request.


They charge 5% besides any Stripe or Paypal fees. The users connect their own Stripe and Paypal accounts and Buy Me a Coffee charges a platform fee.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: