Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Please do a google search to

    building plan filetype:pdf 
(edit: I actually don't know the search terms that would return the real architectural plans and not some texts, I guess somehow Google ranks PDFs with a lot of texts much higher than the documents with drawings -- but I want to point to these actually! If it's true that Google prefers "a lot of text" documents then attempting to googe to get the examples is a good method to miss most of the documents with the real problems!)

Then try to actually look at every page of such documents with pdf.js. The first I get is

http://www.nist.gov/el/nzertf/upload/NZERTF-Architectural-Pl...

Try it. You will want to throw your fast computer thorough the window. Then when you manage to calm down you'll try to configure your browser to never invoke pdf.js again, if you know that you need to work with such documents.

If you want a realistic benchmark, compare the speed of rendering these documents with Adobe's or some other native renderer.

I'm not a building architect but I at least don't live in "we don't draw anything significant in our documents" bubble. I know they worry for their potential customers. Forced pdf.js is a huge setback for them. If they would be able to tell the browser in the html "please don't use pdf.js for this one" they would be much happier.



I just profiled your example: http://people.mozilla.org/~bgirard/cleopatra/#report=4b995cc...

Looks like it could be running much faster:

* 20% of the time is spent copying the canvas because someone is, likely erroneously, holding a reference to the canvas. Looking into it: https://bugzilla.mozilla.org/show_bug.cgi?id=1007897

* 10% of the time is spent waiting on display transactions swaps because canvas isn't triple buffered.

* PDF.js is not getting empty transaction (canvas draw optimizations).

That's just from a quick profile. I'm sure there's a ton more things that could be improved.


Thanks! Good work! It's of much more benefit to analyze slow pdfs than to construct the "proofs" that short and simple pdfs are displayed "fast enough" (but I understand that the later feels so good). Especially since it seems that not only pdf.js will benefit from analyzing the slow ones -- if I understand you traced the problems to the cpp code of the browser? That would mean that even non-pdf-js stuff will be faster once it's fixed, is that correct?


Yes the fix to bug 1007897 will help all web content.

That profile will show calls from JS and how theycall into native C++ calls. This is very useful for profiling things like canvas.

PDF.js isn't bottlenecked by javascript performance, like many well tuned apps out there, so there's a lot of improvements that can be made by tweaking the web platform.


Does your profile explain why large PDFs are so laggy when scrolling in PDF.js? For example take http://www.math.mtu.edu/~msgocken/pdebook2/mapletut2.pdf open it in FF and hold the page down or up. On my laptop, that will lockup FF. Yet, if I open it in Acrobat Reader or Chrome, I can scroll up and down much faster without the jerky behavior.

It is even possible a JavaScript app in FF to get the kind of performance Google gets with their pdf plugin? With the power of today's PC's, it seems like something is seriously wrong with "web technology" if my machine struggles to render pdf documents.


Wonderful! You diagnose so fast it looks like magic! Do you see anything new when profiling this one:

http://www.engageny.org/sites/default/files/resource/attachm...

and

http://www.math.mtu.edu/~msgocken/pdebook2/mapletut2.pdf


http://people.mozilla.org/~bgirard/cleopatra/#report=a848ab9...

The first PDF.js document uses the DOM instead of Canvas like the previous one. This is done to support text selection. Most of the time is spent in the style system. I don't know that area very well but in the past I've seen simplifying CSS selectors make all the difference. I know a fairly important problem for B2G is spending up expensive style flush (https://bugzilla.mozilla.org/show_bug.cgi?id=931668) but I don't know enough about CSS to know if that fix will solve the problem here.


A little OT, but since you're so good at profiling Firefox, I have one more interesting "a lot of real work" page that will maybe inspire you or somebody you know:

http://bellard.org/jslinux/

This emulates in JavaScript the x86 and necessary hardware to really boot Linux 2.6.20(!) On my computer, Opera 12.17 will show "booted in 2.8 seconds," versus Firefox 29 on which it will be "booted in 7.9 seconds." That's 2.8 times slower.


Here's a profile: http://people.mozilla.org/~bgirard/cleopatra/#report=0705c03...

I get between 5-7 seconds here.

Looks like getaliasedvar is causing excessive bailouts from Ion (3rd tier JIT). On top of that the platform is trying to synchronously cancel background Ion Jitting and that is taking an excessive amount of time. Reported as: https://bugzilla.mozilla.org/show_bug.cgi?id=1007927

Tweaking the functions listed in the profile to avoid them bailing out should drastically improve this test case.


And at the opposite side of Bellard's useful code, I've also observed that a simple loop which just does the summation of the doubles like this

    var s = 0.01
    for ( var i = 0; i < 100000000; i++ )
        s += 0.1
    print( s )
Became around twice slower since some version of Firefox (of course, before that point there were a lot of speedups, very old FF can't be compared with the present state).

Still, really the biggest problems I know of at the moments are those PDFs that the architects produce.


That loop doesn't actually do anything, benchmarking it is pretty much meaningless.

It's important to have benchmarks that aren't trivially converted to no-ops or constant loads by the compiler. (In practice the JIT might not be optimizing that one out, but an aggressive C++ compiler certainly would as long as fast math is enabled - so at some point, a typical JS JIT will too).

Also ensure that you're benchmarking warmed code that has been fully jitted. JS code (other than asm.js in Firefox) has multiple stages of compilation in modern runtimes, typically triggered based on how often it is called.


You are wrong. The last line has the meaning of displaying the result to the user (you are supposed to implement it there, I'm lazy. The same goes for prior warm-up, I don't have to specify it here, I just show the loop). Because the result is needed to be shown, the browser is certainly not allowed to optimize away the calculation. Second, it's not allowed to replace it with a multiplication, as it's a floating point arithmetics and the binary representation of the constants involved is not "nice" an the same stands for partial results too. Do compare the result with the multiplication to get the idea (10000000.01 vs 9999999.99112945). All the additions have to be performed one way or another between the loading of the js and the displaying of the result. So it is a good measure of the quality of the translation from the js to the machine code which does the actual calculation and can also easily point to the unnecessary overheads as it's very simple. The regression I observed is therefore a real one, probably observable in other scenarios but harder to pinpoint and probably avoidable, as the better results did exist once. (Of course, if it would be part of some widely popular benchmark cheats would probably be developed, but at the moment there aren't any. Once anybody implements "we don't care for numerics" optimization, it of course should not be used anymore to asses the quality of JS).


> A little OT, but since you're so good at profiling Firefox

Heh, I believe bgirard wrote the Firefox profiler. He's the best there is :)


I'd say, he deserves a raise! Anyway, maybe you should really add the

http://www.nist.gov/el/nzertf/upload/NZERTF-Architectural-Pl...

to your test suites and consider it a worthy goal as it really represents a lot of documents typical for the users who produce complex plans. Have you tried it?

It's immediate in Adobe Reader (as in one second) and takes more minutes in pdf.js Firefox 29.


Yeah, that one is kind of sluggish for me too. Usable, but not smooth.

I filed an issue on pdf.js here: https://github.com/mozilla/pdf.js/issues/4761


Thanks! Finally, a Google search term that will give you a lot of slow PDFs:

     controller site:automationdirect.com filetype:pdf
based on a hint from Gracana's comment.

You have to try to actually see every page in any of the PDFs to get the idea of pain.


I just tried the mentioned document in the viewer on http://mozilla.github.io/pdf.js/web/viewer.html

I agree that the speed is a lot worse than in Preview.app for example, but it's also not unusable. I will look at it in more detail tomorrow


Try to move through the pages, don't stay on the first one! I have the 22nm i7 CPU here and compared to the work with Adobe PDF reader it's just horrible having to use pdf.js.

I must however admit that I'm not able to easily construct a Google search for more such documents, but I know a lot of people who work only with such -- they just can't work with pdf.js.

Does your benchmark measure the time to actually display everything on every page (what the human looking at all the pages must do), or just the time until the browser is responsible?

Edit: inspired from other comments, it seems that at least search for "math plots filetype:pdf" returns more guaranteed problems like

http://www.engageny.org/sites/default/files/resource/attachm...

Still it's hard to find slow PDFs with certainty just by using Google.


> Does your benchmark measure the time to actually display everything on every page (what the human looking at all the pages must do), or just the time until the browser is responsible?

It "only" benchmarks the rendering, all the overhead the viewer produces is not shown. That is intentional, as we will create our own viewer anyway

BTW: PDF.js in FF is typically slower than in Opera / Chrome, all these benchmarks used Opera


I tried loading that and it worked pretty well for me. Not fully native speed, but just a couple seconds to draw complex pages. Certainly not a "throw the computer out the window" experience.


It's really minutes to see all the drawings, i7, Win 8.1, FF 29 (to compare the CPU speeds, jslinux boots in 7.9 secs on my computer). Adobe Reader is immediate to see all the 20 pages.

Have you tried to see all the pages, all the drawings? How many minutes you needed, on which setup? Are you using Firefox? Does it use pdf.js? Is it something OS dependent, or you just didn't look at all the pages?

Now I see, Gracana mentions manuals on AutomationDirect.com. Look at this one for example:

http://www.automationdirect.com/static/specs/dl0506select.pd...


I'm on firefox nightly, using pdf.js, windows. The first pdf takes about 4-5 seconds to render a page, and it always does the page I have onscreen so it doesn't matter how many pages there are.


When measure only the first page you will completely miss a lot of problematic PDFs.


I looked at all the pages, picking them at random. It always did the one I had onscreen in seconds.


So about two or three orders of magnitude slower than native.


Apples and oranges. This is a young implementation and nothing considers its performance when generating a PDF. Until it's had significant optimization you can't draw very good conclusions about html+js vs. native.


This is wrong because PDF.js is Firefox's default way of presenting PDFs. If it's a young unoptimized code base, then maybe Firefox shouldn't make it the default. It is Apples to Apples comparison here because it's the default and how young it may be is irrelevant to the user experience.


It's a very dangerous attitude "I don't read PDF's and I don't care but the customers who read them should tolerate our young poor implementation that needs more than two minutes for just 15 pages."

(The documents like this http://www.automationdirect.com/static/specs/dl0506select.pd... )

They won't. As soon as you prevent them doing their job (and it's so if they used their PDF's normally before you changed the defaults) they will have to search for the solution. The solution is either switch the handler (still a little better for browser writers) or the browser.

By making the opening and looking at the 15-page PDF which was before instantaneous taking two minutes, you prevent them doing their job (the slowdown from subjectively 0 seconds to minutes is also subjectively infinitely worse experience!) and they must respond. They can't open just the first page. They actually care. They need all the pages.


Why do you keep acting like a document cannot be interacted with until every single page is fully rendered? Especially when it goes out of order to get the onscreen pages ready first.


When you need some information from the 15-pages document you don't think "I know I need the 9th page. You look at one page after another. You need a 0 seconds for that with a native renderer (you can't observe that you wait) and you need more minutes with pdf.js -- infinitely longer, enough to not use it.


>infinitely

It takes time to get to the PDF. It takes time to transfer the file if it's of any size. It takes time to look at the pages.

Also you can skim pages that don't have 100% of the drawings on them yet.


PDF.js vs. Adobe Reader is a reasonable comparison to make right now.

PDF.js as representative of html+js viewers vs. 'native' is completely unfair unless you compare to a similarly unoptimized native viewer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: