Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Both these data structures are different from the DOM tree.

In the case of S-expressions that is true. In the case of HTML it may or may not be true. It depends on how the HTML parser is implemented. There is a "natural" mapping of HTML onto a parse tree that is different from the DOM, but that is not part of the standard (AFAIK).

> Can CL-WHO generate HTML that matches that?

Yes, though native Common Lisp does not provide c-like string escapes so putting in newlines is a little awkward. You could, of course, bring in a string interpolation library, but here's how you can do it without that:

    ? (defun nl () (who (fmt "~%")))     ; NL = NewLine
    NL
    ? (defun nli () (who (fmt "~%  ")))  ; NLI = NewLine + Indent
    NLI
    ? (princ (html (:pre (nli) (:span "one" (nli)) (nli) (:br (nli) (:span "two") (nl)))))
    
     <pre>
       <span>one
       </span>
       <br>
       <span>two</span>
     </br></pre>
Or you could do this:

    (html (:pre "
      <span>one
      </span>
      <br>
      <span>two</span>
      <br />
    "))
which looks like cheating but is actually closer to the spirit of the original.

The PRE tag is really weird because it actually changes the way things inside it are parsed. You can actually implement that in Lisp too via reader macros. CL-WHO doesn't support that out of the box, but it's not hard.

I can't imagine anyone actually wanting to do that, though. The PRE tag is for presenting pre-formatted text without changing its appearance, so embedding other tags inside it is kinda perverse. [EDIT: I was wrong about this. See below.]



There are uses for pre with tags embedded.

pre provides the simplified line breaking and usually a monospaced font. However, tags are available to do whatever else.

A major example is that the Vim editor uses pre for formatting syntax colored code to HTML (when you do that with :TOhtml).

The output is a pre block containing various span elements which are styled with CSS.

BTW where in the HTML spec does it say that the interior of pre is parsed differently?

If we are parsing HTML (to Lisp objects or whatever), we should preserve the exact whitespace. The reverse generation should regurgitate the original whitespace.

If we take the license to eliminate newlines, then we ruin pre. The fix is simply not to do that.


> where in the HTML spec does it say that the interior of pre is parsed differently?

I was wrong about that. I had a vague memory of putting HTML inside a PRE tag once and having it come out as if it were escaped, but apparently I hallucinated that.

> A major example is that the Vim editor uses pre for formatting syntax colored code to HTML (when you do that with :TOhtml).

OK, I stand corrected on that too.

> If we are parsing HTML (to Lisp objects or whatever), we should preserve the exact whitespace. The reverse generation should regurgitate the original whitespace. > If we take the license to eliminate newlines, then we ruin pre. The fix is simply not to do that.

Right.

Actually, I just realized that I mis-read the example. I saw <br /> and thought it was </br>. (Maybe the OP edited it?) In any case, the example now reads:

    <pre>
      <span>one
      </span>
      <br>
      <span>two</span>
      <br />
    </pre>
And you can render that in sexpr syntax as:

    (:pre "
      " (:span "one
      ") "
      " (:br) "
      " (:span "two") "
      " (:br) "
    ")
This is a particularly bad example to demonstrate here because the whitespace in the code plays badly with the whitespace in the HN markup. But I tried running this code and it does work. Here is the output copied-and-pasted verbatim from my listener:

    <pre>
      <span>one
      </span>
      <br />
      <span>two</span>
      <br />
    </pre>
Note that both BR tags are rendered as <br />.


It was <br> and <br /> for my example (</br> isn't a valid tag). The point that I was getting at was that <br> and <br /> self-closing tag are represented differently (<tag>, <tag />, and <tag></tag> are all different) in a parsed SGML data structure (though they both are equivalent in the HTML DOM tree in the browser).

This is why you would need separate tags to emit them properly with an S-expr syntax (tag), (tag/), and (tag)(/tag) in my example.


You can do this:

    <tag> ==> (:tag)
    <tag/> ==> (:tag nil)
    <tag></tag> ==> (:tag "")
Using (:tag/) is a bad idea because that would screw up attributes.

CL-WHO doesn't support this, but that would be easy to change if it ever actually mattered to anyone.


> In HTML, <tag/> and <tag></tag> are equivalent

In HTML, <script></script> is valid. <script /> is invalid. <br /> is valid. <br></br> is invalid. So they are represented differently.

> Using (:tag/) is a bad idea because that would screw up attributes.

For my example?

    ((:tag/ :attr "value"))               => <tag attr="value" />
    ((:tag  :attr "value") "..." (:/tag)) => <tag attr="value">...</tag>
> You actually can distinguish between those if you really want to. It's just a matter of picking a convention.

That sounds like it could work. So a leading `nil' would be treated as a special case (not a child node):

    (:pre "
      " (:span "one
      ") "
      " (:br) "
      " (:span "two") "
      " (:br nil) "
    ")


> <script /> is invalid. <br /> is valid. <br></br> is invalid.

OK, then the best way to handle that is to let the HTML-renderer know that different tags need to be rendered differently if they're empty. Are there any cases where you would ever want to distinguish between the various kinds of empty tags?

    ((:tag/ :attr "value"))               => <tag attr="value" />
    ((:tag  :attr "value") "..." (:/tag)) => <tag attr="value">...</tag>
No, that's not what you want. Let's start with this general form:

((:tag attr value ...) content ...) => <tag attr=value ...> content ... </tag>

Let's assume we have no attributes so I don't have to keep typing those. Then we have:

((:tag) content ...) => <tag> content ... </tag>

In this case (no attributes) we can unambiguously remove the parens around (:tag) and get:

(:tag content ...) => <tag> content ... </tag>

Now if we have no content we get:

(:tag) => <tag></tag>

All this is still completely regular, no special cases. But now if we write (:br) we get <br></br> which is not what we want. So we need to tell the renderer that some empty tags get rendered one way, and other empty tags get rendered another way. CL-WHO does this.

Notice that we have not actually typed any / characters. This is important. The role played by / in HTML is played by the close-paren in sexpr syntax. If we re-introduce the / into our new syntax we will have a hopeless mess.

> So a leading `nil' would be treated as a special case

That is exactly right. If (and this is a big if) we want to be able to write something equivalent to both <tag/> and <tag></tag> in the same document we have to be able to distinguish between those two things in the markup somehow. I just looked this up and the distinction that HTML makes between <tag /> and <tag></tag> is that the former content is EMPTY while the latter content is "" (i.e. the empty string). So really the Right Thing would be:

(:tag) => <tag />

(:tag "") => <tag></tag>

That will work, but now we have to remember to add an empty string in some situations, e.g.:

((:script :src "...") "")

Personally I would find this annoying, so I would choose to go with the lookup table.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: