More

swhipple · on July 21, 2017

This has been available for contributors in the United States for some years now, and I've used it to sign the assignment documents in the past. It might not be available in all countries though.

> If you use GPG, you can sign your assignment using a detached signature in the following manner:

> gpg -a --detach-sig ASSIGNMENT

> Where ASSIGNMENT is the PDF file(s) as you have received it from us.

> Then simply email the assignment, key ID, and signature file back to us at [email protected]. Please make sure that your key is listed on a public keyserver.

u801e · on July 21, 2017

I'm actually surprised that no one mentioned this sooner. It's too bad that this isn't an option for most other legal documents that one has to submit in a lot of situations.

swhipple · on July 17, 2017

The mkstemp you are seeing is probably the one from gnulib. A lot of GNU software uses their own versions if they detect the platform one is broken; it may be incorrectly replacing it in this case.

https://www.gnu.org/software/gnulib/manual/html_node/mkstemp...

https://github.com/emacs-mirror/emacs/blob/65eee8392ff95f58f...

swhipple · on March 30, 2017

The EPL is roughly in the middle of MPL and LGPL in terms of copyleft scope. MPL requires per-'file' source sharing. EPL requires per-'module' source sharing. LGPL requires per-'module' source sharing and separating the module to permit the user to use/replace it without additional restrictions.

The one downside that EPL has is that, unlike MPL (without "Incompatible With Secondary Licenses" notice) or LGPL, it is incompatible with the GPL-family of licenses. It specifies different conditions for providing the source code and has a governing law clause (New York).

It is popular in the Java ecosystem (Eclipse, Clojure, etc).

swhipple · on March 25, 2017

The main problem with pushing a typechecked live-upgrade in one shot is that you'll need to put a big lock around the distributed system (A non-upgraded node messaging an upgraded one would be fine, because the upgraded one knows the conversion function, but what happens in reverse scenario?)

It could be done without a big lock by splitting into three steps:

1) Push an upgrade that changes the types and adds the conversion functions. The valid type is the union of the old type and the new type. Wait until all nodes complete the upgrade.

2) Push an upgrade that instructs the nodes to convert their data and start using the new types by default. Wait until all nodes complete the upgrade.

3) Push an upgrade that removes the old types and conversion functions.

msangi · on March 25, 2017

Why is this a limitation only of type-checked functions?

The problem is whether the function can or cannot handle the content of the new message and that's completely orthogonal

swhipple · on March 25, 2017

The difference is that attempting to add code to a live system that mishandled an input (either from a non-upgraded node or an upgraded one) would result in a type-error and reject the upgrade.

You could still use Erlang's crashing and supervisor technique, but you would have the additional benefit of using static typing across a distributed system (where each node may or may not have received the upgrade yet).

swhipple · on March 3, 2017

> As I said, name me a format that I can't parse correctly as a Python one-liner.

mboxo? [1] It is a popular text format that cannot be unambiguously parsed.

More generally, most Unix tools' output is also not able to be unambiguously parsed. For example, use gcc to compile a file, then collect the warnings? The regex "^.+:\d+:\d+: warning.*" will be right most of the time, but there's no 'correct' way to parse gcc output (there is not a surjective mapping of output to input).

There are various ways to work around the problem: mboxrd format uses an escape sequence to work around the earlier problem mentioned with mboxo. `ls -l --dired' (GNU) will allow you to parse ls by appending filename byte offsets to the output. `wc --libxo xml` (FreeBSD) will give the output in XML, which is unambiguous as well. multipart/form-data (RFC2388) is used to embed binary data in a text format, by using a byte sequence which doesn't appear in the data.

Binary formats present their own set of issues, but "accidentally unparseable" is more common in text-based formats (or ad-hoc text output).

[1] https://jdebp.eu/FGA/mail-mbox-formats.html

jstimpfle · on March 3, 2017

Thanks!

It's true that filenames with whitespace or newlines are bad for interoperability ("make" is another example). There are three simple options: escaping filenames, making filenames NUL-terminated or declare such filenames as invalid. The latter way seems to have won for practical reasons, and it's a pity that "safe filenames" were never standardized (but C-identifier plus extension should be safe everywhere).

Mbox is definitely broken (for example body lines that start with "From" are changed to "> From"). I don't think it is ambiguous today (all software I know interprets "From " at the beginning of a line as a new mail), but it clearly was not much designed at all. It still has some precious properties which is why it's still in use today. For example, appending a new email (Mail server) is very fast. Crude interactive text search works also very well in practice, although automation can't really be done without a library.

Email is complex data (not line- or record-oriented), so various storage formats achieving various tradeoffs are absolutely justified.

> Binary formats present their own set of issues, but "accidentally unparseable" is more common in text-based formats.

It's true, especially with formats from the 70s where the maxime was "be liberal in what you accept", and where some file formats weren't really designed at all.

On the other hand, "accidentally unextendable" (for example, fixed-width integers) and "accidental data loss" is much more common in binary formats.

swhipple · on Feb 22, 2017

See here [1], under "Why make this change?".

[1] https://www.nczonline.net/blog/2016/10/the-ecmascript-2016-c...

nailer · on Feb 22, 2017

Ah that makes sense - by the time the parser sees a 'use strict' it's too late to apply strict mode to the default arguments. Thanks!

swhipple · on Feb 21, 2017

I don't quite understand the real world use-case Veriexec is designed to solve.

1) Prevent tampering by making part of the system immutable? The fingerprint isn't necessary; unconditionally prevent modification to the relevant files instead.

2) Prevent tampering by using trusted files? Normally this should be done by having a set of trusted keys, not hardcoded hashes. That way you can still securely upgrade the system.

3) Accessing files from a remote untrusted filesystem? This doesn't seem to work either; see the caveats section in veriexec(9).

Am I missing something here?

swhipple · on Feb 5, 2017

> HTML is a string of characters (syntax). The DOM is a data structure (semantics). [...] S-expressions are a data structure, different from the DOM, but S-expression syntax is a syntax.

I believe this is where the confusion is coming from. When you parse HTML syntax, you get a data structure; this is the same as when you read sexpr syntax, you also get a data structure. Both these data structures are different from the DOM tree.

Try this example:

    <pre>
      <span>one
      </span>
      <br>
      <span>two</span>
      <br />
    </pre>

Can CL-WHO generate HTML that matches that? (i.e. feed both into a tool like BeautifulSoup and produce the same data structure?)

Outside of CL-WHO and Hiccup-type libraries, you can of course use S-exprs to represent the same data structure. Here's a hypothetical S-expr syntax that might produce the same data structure:

    ((pre)
      "\n  " (span) "one\n  " (/span)
      "\n  " (br)
      "\n  " (span) "two" (/span)
      "\n  " (br/) "\n"
     (/pre))

Which is what I believe JimDabell meant by:

> you can't represent all valid HTML documents as S-expressions, at least not in the convenient way people assume

lisper · on Feb 5, 2017

> Both these data structures are different from the DOM tree.

In the case of S-expressions that is true. In the case of HTML it may or may not be true. It depends on how the HTML parser is implemented. There is a "natural" mapping of HTML onto a parse tree that is different from the DOM, but that is not part of the standard (AFAIK).

> Can CL-WHO generate HTML that matches that?

Yes, though native Common Lisp does not provide c-like string escapes so putting in newlines is a little awkward. You could, of course, bring in a string interpolation library, but here's how you can do it without that:

    ? (defun nl () (who (fmt "~%")))     ; NL = NewLine
    NL
    ? (defun nli () (who (fmt "~%  ")))  ; NLI = NewLine + Indent
    NLI
    ? (princ (html (:pre (nli) (:span "one" (nli)) (nli) (:br (nli) (:span "two") (nl)))))
    
     <pre>
       <span>one
       </span>
       <br>
       <span>two</span>
     </br></pre>

Or you could do this:

    (html (:pre "
      <span>one
      </span>
      <br>
      <span>two</span>
      <br />
    "))

which looks like cheating but is actually closer to the spirit of the original.

The PRE tag is really weird because it actually changes the way things inside it are parsed. You can actually implement that in Lisp too via reader macros. CL-WHO doesn't support that out of the box, but it's not hard.

I can't imagine anyone actually wanting to do that, though. The PRE tag is for presenting pre-formatted text without changing its appearance, so embedding other tags inside it is kinda perverse. [EDIT: I was wrong about this. See below.]

kazinator · on Feb 5, 2017

There are uses for pre with tags embedded.

pre provides the simplified line breaking and usually a monospaced font. However, tags are available to do whatever else.

A major example is that the Vim editor uses pre for formatting syntax colored code to HTML (when you do that with :TOhtml).

The output is a pre block containing various span elements which are styled with CSS.

BTW where in the HTML spec does it say that the interior of pre is parsed differently?

If we are parsing HTML (to Lisp objects or whatever), we should preserve the exact whitespace. The reverse generation should regurgitate the original whitespace.

If we take the license to eliminate newlines, then we ruin pre. The fix is simply not to do that.

lisper · on Feb 6, 2017

> where in the HTML spec does it say that the interior of pre is parsed differently?

I was wrong about that. I had a vague memory of putting HTML inside a PRE tag once and having it come out as if it were escaped, but apparently I hallucinated that.

> A major example is that the Vim editor uses pre for formatting syntax colored code to HTML (when you do that with :TOhtml).

OK, I stand corrected on that too.

> If we are parsing HTML (to Lisp objects or whatever), we should preserve the exact whitespace. The reverse generation should regurgitate the original whitespace. > If we take the license to eliminate newlines, then we ruin pre. The fix is simply not to do that.

Right.

Actually, I just realized that I mis-read the example. I saw <br /> and thought it was </br>. (Maybe the OP edited it?) In any case, the example now reads:

    <pre>
      <span>one
      </span>
      <br>
      <span>two</span>
      <br />
    </pre>

And you can render that in sexpr syntax as:

    (:pre "
      " (:span "one
      ") "
      " (:br) "
      " (:span "two") "
      " (:br) "
    ")

This is a particularly bad example to demonstrate here because the whitespace in the code plays badly with the whitespace in the HN markup. But I tried running this code and it does work. Here is the output copied-and-pasted verbatim from my listener:

    <pre>
      <span>one
      </span>
      <br />
      <span>two</span>
      <br />
    </pre>

Note that both BR tags are rendered as <br />.

swhipple · on Feb 6, 2017

It was <br> and <br /> for my example (</br> isn't a valid tag). The point that I was getting at was that <br> and <br /> self-closing tag are represented differently (<tag>, <tag />, and <tag></tag> are all different) in a parsed SGML data structure (though they both are equivalent in the HTML DOM tree in the browser).

This is why you would need separate tags to emit them properly with an S-expr syntax (tag), (tag/), and (tag)(/tag) in my example.

lisper · on Feb 6, 2017

You can do this:

    <tag> ==> (:tag)
    <tag/> ==> (:tag nil)
    <tag></tag> ==> (:tag "")

Using (:tag/) is a bad idea because that would screw up attributes.

CL-WHO doesn't support this, but that would be easy to change if it ever actually mattered to anyone.

swhipple · on Feb 6, 2017

> In HTML, <tag/> and <tag></tag> are equivalent

In HTML, <script></script> is valid. <script /> is invalid. <br /> is valid. <br></br> is invalid. So they are represented differently.

> Using (:tag/) is a bad idea because that would screw up attributes.

For my example?

    ((:tag/ :attr "value"))               => <tag attr="value" />
    ((:tag  :attr "value") "..." (:/tag)) => <tag attr="value">...</tag>

> You actually can distinguish between those if you really want to. It's just a matter of picking a convention.

That sounds like it could work. So a leading `nil' would be treated as a special case (not a child node):

    (:pre "
      " (:span "one
      ") "
      " (:br) "
      " (:span "two") "
      " (:br nil) "
    ")

lisper · on Feb 6, 2017

> <script /> is invalid. <br /> is valid. <br></br> is invalid.

OK, then the best way to handle that is to let the HTML-renderer know that different tags need to be rendered differently if they're empty. Are there any cases where you would ever want to distinguish between the various kinds of empty tags?

    ((:tag/ :attr "value"))               => <tag attr="value" />
    ((:tag  :attr "value") "..." (:/tag)) => <tag attr="value">...</tag>

No, that's not what you want. Let's start with this general form:

((:tag attr value ...) content ...) => <tag attr=value ...> content ... </tag>

Let's assume we have no attributes so I don't have to keep typing those. Then we have:

((:tag) content ...) => <tag> content ... </tag>

In this case (no attributes) we can unambiguously remove the parens around (:tag) and get:

(:tag content ...) => <tag> content ... </tag>

Now if we have no content we get:

(:tag) => <tag></tag>

All this is still completely regular, no special cases. But now if we write (:br) we get <br></br> which is not what we want. So we need to tell the renderer that some empty tags get rendered one way, and other empty tags get rendered another way. CL-WHO does this.

Notice that we have not actually typed any / characters. This is important. The role played by / in HTML is played by the close-paren in sexpr syntax. If we re-introduce the / into our new syntax we will have a hopeless mess.

> So a leading `nil' would be treated as a special case

That is exactly right. If (and this is a big if) we want to be able to write something equivalent to both <tag/> and <tag></tag> in the same document we have to be able to distinguish between those two things in the markup somehow. I just looked this up and the distinction that HTML makes between <tag /> and <tag></tag> is that the former content is EMPTY while the latter content is "" (i.e. the empty string). So really the Right Thing would be:

(:tag) => <tag />

(:tag "") => <tag></tag>

That will work, but now we have to remember to add an empty string in some situations, e.g.:

((:script :src "...") "")

Personally I would find this annoying, so I would choose to go with the lookup table.

swhipple · on Jan 27, 2017

There's currently a feature freeze and plan to switch to WebExtension after the necessary APIs are added to Firefox: https://github.com/akhodakivskiy/VimFx/issues/860

swhipple · on Jan 20, 2017

That's interesting -- I was wondering in which cases typedef changes the parse tree, and came across a few [1]:

    a (b);      /* function call or declaration */
    a * b;      /* multiplication or declaration */
    f((a) * b); /* multiplication or deref and cast */

> With one further change, namely deleting the production typedef-name: identifier and making typedef-name a terminal symbol, this grammar is acceptable to the YACC parser-generator.

[1] http://eli.thegreenplace.net/2007/11/24/the-context-sensitiv...

int_19h · on Jan 21, 2017

С++ takes it all the way to 11 with templates. Here's a program that is parsed differently depending on whether pointers are 32-bit or 64-bit:

    template<size_t N = sizeof(void*)> struct a;

    template<> struct a<4> {
        enum { b };
    };

    template<> struct a<8> {
        template<int> struct b {};
    };

    enum { c, d };

    int main() {
        a<>::b<c>d;
        d;
    }

Depending on which instantiation is used, the first line of main is either a variable declaration, or two operators < and > applied in sequence.

This is especially fun to deal with for C++ IDEs that support semantic highlighting (i.e. typenames are in a different color etc). If I remember correctly, the first one that could handle this right was VS 2012 - it only took 14 years after ISO C++ standard was released...