Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As someone who's been quite heavily involved with web-platform-tests, I'd caution against any use of the test pass rate as a metric for anything.

That's not to belittle the considerable achievements of Ladybird; their progress is really impressive, and if web-platform-tests are helping their engineering efforts I consider that a win. New implementations of the web platform, including Ladybird, Servo, and Flow, are exciting to see.

However, web-platform-tests specifically decided to optimise for being a useful engineering tool rather than being a good metric. That means there's no real attempt to balance the testsuite across the platform; for example a surprising fraction of the overall test count is encoding tests because they're easy to generate, not because it's an especially hard problem in browser development.

We've also consciously wanted to ensure that contributing tests is low friction, both technically and socially, in order that people don't feel inclined to withhold useful tests. Again that's not the tradeoff you make for a good metric, but is the right one for a good engineering resource.

The Interop Project is designed with different tradeoffs in mind, and overcomes some of these problems by selecting a subsets of tests which are broadly agreed to represent a useful level of coverage of an important feature. But unfortunately the current setup is designed for engines that are already implementing enough feature to be usable as general purpose web-browsers.



The tweet mentions that this is an arbitrary metric thrust upon them by Apple, so I don’t think they would necessarily disagree with you. During the monthly updates they do also show the passing number of tests without including the encoding tests because of how much they skew things.


The problem is, there's no other good metric. We used to have Acid tests for CSS, but in absence of that, it's as good metric as any.


Some modern ACID-style tests are a nice idea actually.


Are Acid tests no longer available?


Acid 2 bakes in the assumption that you will be displaying it on a desktop/laptop monitor with 100% scaling; It depends on pixel accuracy.

This was a reasonably universal assumption in 2005, but became less and less valid over time, we now have high-dpi screens and the whole idea of pixel accuracy has fallen out of favour (it was never a good idea, but 2005) as phone browsers are expected to rescale websites for better readability/usability.

The result is that Acid 2 fails on my phone, and on my laptop it will pass/fail depending on which screen the window is on.

Acid 3 was too forwards looking and rigid. While Acid 2 was (mostly) testing accepted standards (which IE6 implemented very poorly), Acid 3 tested a bunch of draft standards. It was very strict on many things that weren't well defined and later versions of the standards took the opposite approach.

Basically, Acid 2 was very good at shaming Microsoft into fixing Internet Explorer; But in the long run the whole concept of popular cherry picked torture tests proved to be of limited usefulness (and actually counterproductive) to promoting standards compliant browsers.


They no longer reflect what the average user expects their browser to support. You can pass it and miss on several important things that are considered widespread features nowadays.


They are, but they arent great tests of what a browser is capable of. For example, Firefox does not pass Acid2 or Acid3


Ladybird will be faster than anything with an arbitrary metric thrust


mmm yes and lift


Could a hand-picked subset be selected to make that metric?


Everything you said sounds very reasonable, yet the "Browser-Specific Failures" graph on the main page of the wpt.fyi website explicitly misleads us into thinking

PS I'm a big fan of the work and appreciate what you do. I check the interop page about once a week!


As someone who's been quite heavily involved with having a brain, I'd advocate for using of the test pass rate as a metric for how many tests are passed.


Why are you bringing this up, when it’s not been implemented as a metric here, but because Apple requires it for iOS.


This is a headline that is very easy to misread and or misunderstand. I don’t find their comment to be that out of place at all.


Root comment is lecturing the ladybird team about not using this suite as a metric, which is totally uncalled for. That’s what I’m trying to convey.


"lecturing" is carrying a lot of needless weight here. Their comment doesn't read like that, they're just pointing out that the metric itself isn't what it seems to be.


> but because Apple requires it for iOS

Therefore it is a metric used by Apple.


In the spirit of malicious compliance, thus being a bad metric would probably be a feature in their book.


Malicious compliance?

The EU DMA says they have to allow third party browser engines access to the same resources (the JIT) that Safari has. It specifically allows them to place reasonable requirements on those third party alternatives:

> The gatekeeper shall not be prevented from taking, to the extent that they are strictly necessary and proportionate, measures to ensure that third-party software applications or software application stores do not endanger the integrity of the hardware or operating system provided by the gatekeeper, provided that such measures are duly justified by the gatekeeper.

Access to rwx memory is inherently dangerous, and it's completely reasonable to expect third parties to have proven that they are serious about producing a usable browser engine before putting such a risky product on the market for consumers to download. The law does not require them to allow any third party application to access the JIT, only a third party application that competes with Safari (a usable web browser).


Yes, but that doesn't require rendering performance or anything like that, but absence of security problems.

You can't justify a requirement for a minimum level of performance or some capability. You can justify a requirement of a guaranteed absence of security bugs, provided that that's a standard you impose on yourself throughout the system.


There are literally no other metrics.

Web Platform Tests were literally a project to align browsers on compatible implementations of a bunch of web APIs. Started by Opera and w3c and maintained by w3c https://www.bocoup.com/blog/wpt-an-overview-and-history


Then talk to apple. They are the ones who put this bar in place.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: