Hacker Newsnew | past | comments | ask | show | jobs | submit | zjaffee's commentslogin

If there's a single section of the entire world where daylight savings makes the most, it's above and below the 45th parallel. This means the earliest sunrise is 9am in the winter what a horrible idea just to give people a little bit more sunlight when they'd still be out at work anyways.


I mean, it's objectively true that they can do this, especially when even mildly filtered down by incoming external data.

It's why you no longer need to speak with a person when reentering your home country in a lot of different places (israel being one of them, but also the EU, trusted travelers in the US through global entry, ect).


There's no need to counter it, the whole point is to hit the social aspect of being on these platforms. If even half the kids can't figure out how to make it work, then a massive part of the problem is solved because a much larger percentage are only using it due to network effects.


I can't speak about this being a current law, but there were laws in multiple US states at various times that prevented you from storing facial data on the server. In turn features like snapchat's face filters were doing all the relevant computation locally on the device (which back then was certainly a complicated achievement).

US tech companies are constantly under FTC audit relating to how they use user data. This is certainly not something that needs to be seriously worried about, certainly less so than say the way in which cameras placed all over cities are used to track all sorts of people or storing GPS locations attached to a specific devices UUID.


Isn't this essentially just trying to reinvent ERP (i.e. what SAP has built a 207 billion dollar company at time of writing on and 90% of fortune 500 companies along with endless other large organizations use).

One can argue that ERP as code is higher value than whatever it is right now, but to act like this is a totally new idea is insane.


I worked in a place where basically everything that happened in the company was implemented as actions within Lotus Notes.

While the choice of implementation and performance were abysmal (Notes was a great/the only choice when the decision was made but 25 years later not so much), the actual idea was amazing and it worked extremely well.


> the actual idea was amazing and it worked extremely well.

What do you think are the reasons it worked so well? Any anecdotes of why it was so effective?


The iOS version of most social media apps is better. IOS simply has better API integration to it's hardware, where with android, many OEMs (hell this was even the case to a certain extent with older pixel phones), do a number of things that make the hardware not as easily accessible as quickly from the OS API for said feature.

This is especially relevant for the camera, but also various other sensors and hardware modules that exist inside these phones.

That said, in recent years there are just a number of other areas that android is much better at such as deeper AI integration, which goes back to even prior to the current LLM craze.


What are those things?


I'm originally from the US, but where I live now, whatsapp functionally replaced email for a lot of different types of communication (that would be an email in the US). Recruiters text me on whatsapp about jobs, I can ask for a prescription renewal through it, and I get support from everything ranging from a government agency to customer support for things from businesses, ect.


One thing that is repeatedly underdiscussed about open source is that every time you have a major open source project become successful, be that anything from Linux to Apache Spark, you have private companies who come in, build something that can very reasonably still be called Linux or Apache Spark, but underneath has tons and tons of extra stuff that they never feed back into the open source community.

Hell, I think with the later (since all major cloud providers deploy their own version of spark on their respective data processing cluster services), people don't even know that they aren't in fact using open source software. Hell, eventually you get to a point where companies that choose not to use these third party services eventually just open source their own improvements or abstractions as again separate open source projects that never make it into the upstream project (which are often times heavily influenced by profit making entities).

This has been the model for a very long time, going back to at least the likes of redhat. And certainly will be going forward with countless future projects. Maybe there needs to be new models of open source governance, but I have no clue how successful such a thing would even be.


> but underneath has tons and tons of extra stuff that they never feed back into the open source community.

Very unlikely for GPL2 projects


See cloud provider specific distros, or Android Linux kernel.

Thing is, when they misbehave, someone has to have the money to bring them to court.


It depends on what you were trying to with the data. Hadoop would never win, but Spark can allow you to hold all that data in memory across multiple machines and perform various operations on it.

If all you wanted to do was filter the dataset for certain fields, you can likely do something faster programmatically on a single machine.


It's not about how much data you have, but also the sorts of things you are running on your data. Joins and group by's scale much faster than any aggregation. Additionally, you have a unified platform where large teams can share code in a structured way for all data processing jobs. It's similar in how companies use k8s as a way to manage the human side of software development in that sense.

I can however say that when I had a job at a major cloud provider optimizing spark core for our customers, one of the key areas where we saw rapid improvement was simply through fewer machines with vertically scaled hardware almost always outperformed any sort of distributed system (abet not always from a price performance perspective).

The real value often comes from the ability to do retries, and leverage left over underutilized hardware (i.e. spot instances, or in your own data center at times when scale is lower), handle hardware failures, ect, all with the ability for the full above suite of tools to work.


Other way around. Aggregation is usually faster than a join.


Disagree, though in practice it depends on the query, cardinality of the various columns across table, indices, and RDBMS implementation (so, everything).

A simple equijoin with high cardinality and indexed columns will usually be extremely fast. The same join in a 1:M might be fast, or it might result in a massive fanout. In the case of the latter, if your RDBMS uses a clustering index, and if you’ve designed your schemata to exploit this fact (e.g. a table called UserPurchase that has a PK of (user_id, purchase_id)) can still be quite fast.

Aggregations often imply large amounts of data being retrieved, though this is not necessarily true.


That level of database optimization is rare in practice. As soon as a non-database person gets decision making authority there goes your data model and disk layout.

And many important datasets never make it into any kind of database like that. Very few people provide "index columns" in their CSV files. Or they use long variable length strings as their primary key.

OP pertains to that kind of data. Some stuff in text files.


How is a proper PK choice a high level of optimization?


unconvinced. any join needs some kind of seek on the secondary relation index, or a bunch of state if ur stream joining to build temporary index sizes O(n) until end of batch. on the other hand summing N numbers needs O(1) memory and if your data is column shaped it’s like one CPU instruction to process 8 rows. in “big data” context usually there’s no traditional b-tree index to join either. For jobs that process every row in the input set Mr Join is horrible for perf to the point people end up with a dedicated join job/materialized view so downstream jobs don’t have to re do the work


An aggregation is less work than a join. You are segmenting the data in basically the same way in ideal conditions for a join as you are in an aggregation. Think of an aggregation as an inner join against a table of buckets (plus updating a single value instead of keeping a number of copies around). In practice this holds with aggregation being a linear amount faster than a join over the same data. That delta is the extra work the join needs to do to keep around a list of rows rather than a single value being updated (and in cache) repeatedly. Depending on the data this delta might be quite small. But without a very obtuse aggregation function (maybe ketosis perhaps), the aggregation will be faster. Its updating a single value vs appending to a list with the extra memory overhead this introduces.


I'm saying that a smaller amount of data means more compute is required for a join. Sorry if that wasn't clear.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: