oobey's comments

oobey · on July 19, 2017

Our software uses QR codes for inventory tracking. They're absolutely wonderful - you can store loads of data in a format that's very easily parsed by handheld scanning devices. So much data that you can completely do away with having to call out to a central database for item information, as you do with bar codes.

QR codes are great, and I never understood why they're treated as such a joke.

crispyambulance · on July 19, 2017

I think they're great too, but they don't hold more info (per unit area) than other types of symbology like, for example, datamatrix. If you really want to stuff tons of data in the symbol itself, you need PDF417. That can be scanned by a specialized line scanner (admittedly not convenient for phone cameras)

QR codes often encode URL's/URI's and the application typically hits up a database upon scanning them just like any other inventory barcode.

nayuki · on July 20, 2017

It is not clear whether these statements are true.

* QR codes mandate a minimum of 15% area used for error correction and a maximum of 60%.

* Data Matrix codes mandate a minimum of 0% area used for error correction and a maximum of 50%.

* The amount of overhead in QR codes (finders, timing, alignment) is probably in the same ballpark as Data Matrix codes (black lines, checkered lines).

* PDF417 codes demand low vertical resolution (e.g. 15 thick steps) but high horizontal resolution (e.g. 200 thin steps). They are hard to scan on a phone camera. Although this thick-thin design is appropriate for line scanners, the asymmetry in horizontal-vertical information density leads to poor information density overall and higher susceptibility to damage and unreadable barcodes. Square primitive blocks make sense because they are more spatially balanced.

crispyambulance · on July 20, 2017

Datamatrix uses ECC200 almost always. In practice, especially for phone-reader applications, the limiting factor for data transmission is not the symbology choice (datamatrix/QR) but limitations from the camera.

That said dense 2D barcodes read by dedicated imagers in controlled lighting scenarios, as far as I've seen, are always datamatrix. Not sure if that's merely historical or if there's a technical reason for that.

monocasa · on July 19, 2017

We were just in the the trough of disillusionment.

https://en.wikipedia.org/wiki/Hype_cycle

oobey · on May 26, 2017

So, how should kids be taught? Since learning styles are a myth, would it be okay to skip in-person lessons entirely and just move everyone to individual text-based book learning?

After all, it looks like the idea that "someone might learn better in person" or "by discussing things with peers" is complete bunk, and one method should be sufficient for everyone.

Pulcinella · on May 26, 2017

You are thinking too broadly about learning styles. Things like practice/hands-on-learning are one good way of teaching, for example.

It's not learning styles as in different styles and methods of teaching and learning that have been debunked. It's Learning Styles™ that have been debunked (i.e. Auditory, visual, kinesthetic).

bcoates · on May 26, 2017

I think you're confusing learning with memorization and/or indoctrination? If your lessions are only lecture, discussion, and reading then you can go ahead and skip them and replace them with nothing [1].

It's bunk that some students need lectures and other students need books, but these are only communication tools and they cannot learn knowledge from them.

[1] past the point where learning how to read/listen/discuss is itself the goal of the lesson

adamnemecek · on May 26, 2017

Amen! I've been saying this for years. Do you know your MBTI type by any chance?

oobey · on May 26, 2017

Yes, it's ISTJ.

oobey · on March 10, 2017

Would he have made those same comments to resist going from 16 to 32 bit?

Hell, why not stick with 8 bit? We can just optimize everything to work on that, right?

kazinator · on March 10, 2017

More is always better. Four wheels is better than two; cars should have eight wheels.

Look at GUID partition tables. With MBR, we had to hobble along with only a byte to identify partition types. Now we have 128 bits. We can finally support more filesystem kinds now than there are atoms in the Sun, all in one system installation, and the bootloader just has to look at the GUID. A 32 bit "fourcc" partition label clearly wouldn't have been enough.

XorNot · on March 10, 2017

GUID partition tables mean we don't need to coordinate identifiers. That's the point.

kazinator · on March 10, 2017

But in fact, we don't need to coordinate identifiers. Clashes in partition ID's are of no practical consequence.

A 32 bit "fourCC" would be more than adequate. It could even be constrained just to readable characters, like LNXF (Linux Filesystem) and LNXS (Linux Swap).

If another OS happens to use LNXF for something, and you have that OS in the same darn system, it doesn't matter. You just don't have that "foreign" LNXF in your /etc/fstab, and likewise it doesn't have the Linux ones in its equivalent of /etc/fstab.

The only thing that needs a clash-free label is the EFI boot partition, so the boot firmware can unambiguously identify all these partitions on all attached devices and offer them as boot options.

XorNot · on March 11, 2017

Why have them at all then?

If the argument you want to make is that they are of no consequence, then you need to answer why your proposal of still doing something is warranted.

kazinator · on March 11, 2017

Because we can usefully assign distinct values in a context like "Windows plus Linux box" in which we don't care about some exotic file system that was once used on a DEC VAX or whatever.

XorNot · on March 12, 2017

So why not use a single byte? How many machines do you know which have more then 255 filesystems on the one disk?

Why not a nibble and just use the other 4 bits for flags?

The point of this line of questioning is you're quibling over bytes which definitely don't matter in any modern context, at the expense of masses of extra management complexity to try and avoid day-to-day problems when people want to stand up new systems.

With GPT, if I want to make a new filesystem type for some application, I just generate a GUID and it will not collide without me needing to coordinate with anyone.

kazinator · on March 12, 2017

Indeed; one byte works for me. Four is a reasonable political compromise between "one byte works for me" and "oh my, what about clashes?"

16 bytes is an obvious example of the "second system effect" described by Fred Brooks in Mythical Man Month.

The fdisk utility now reduces the GUIDs to one byte codes that refer to the GUIDs. For instance, I remember that 29 is Linux RAID (previously FD). Will 29 always be Linux RAID everywhere? Probably not.

> With GPT, if I want to make a new filesystem type for some application ...

Four bytes could have an ample reserved range for local use by hobbyists.

Broad recognition of the code only matters if the application is very widely deployed.

whitefish · on March 10, 2017

No. With 8 bit you had to execute multiple instructions to add two numbers. Same with 16 bit. This problem went away with 32 bit. Adding more bits beyond 32 does not bring proportional benefits because the numbers we deal with fit in 32 bit.

floatboth · on March 10, 2017

"the numbers we deal with fit in 32 bit"

Except when they don't. Everyone already forgot tweet number 2147483648? :) https://techcrunch.com/2009/06/12/all-hell-may-break-loose-o...

wolfgke · on March 10, 2017

> No. With 8 bit you had to execute multiple instructions to add two numbers. Same with 16 bit.

Wrong (for x86-16 vs. x86-32). Just use an operand-size size override prefix (0x66) with your 16 bit real mode ALU (in this case 'add') instruction to make it a 32 bit ALU instruction. Works from 80386 on, where the 32 bit registers were introduced.

phkahler · on March 10, 2017

I agree, most numbers we deal with fit in 32 bits with the exception of double precision floating point and indexes for really large data sets. As Moore's law seems to be ending perhaps there might be a sweet spot at 48bits for both integer and FP.

The one thing I found absurd with RISC-V is the 128bit variant. Most 64bit processors today don't even support a full 64bit virtual address space do they?

loarabia · on March 10, 2017

I think he touched on this a little in a follow-up he did: https://blogs.msdn.microsoft.com/ricom/2016/01/04/64-bit-vis...

My interpretation of the shift from < 32 bits into 32 bits is: before we had do do crazy things to algorithms we used to fit in those address spaces. When we transitioned to 32 bits, we didn't have to do that anymore.

So the question might be are there any surprising workarounds in the code because you're only dealing with 32 bit code where if you had 64 bits you could write some more elegant solution.

syncsynchalt · on March 10, 2017

It's not exactly what you're asking for, but you'll run into a big one in about two decades.

The only other example I can think of is the general "problem" of large databases. There's just a lot more paging and churn that has to happen in a 32-bit address space. Many NoSQL databases in particular have a memory model of mmap'ing an entire database, which runs into a hard limit on 32-bit address space.

pvg · on March 10, 2017

No, because the steps are exponential. You can sit down and type your way through 64k. If this was an argument by induction, then 1 bit would have been fine too.

mschaef · on March 10, 2017

You could continue this argument out to 128 or 256 bits. Where it starts to fall down is when you map the size of those address spaces back to the data types people work with.

In a 16-bit address space (64K), you hit the 16-bit limit _all the time_. Even a moderately sized text document will be bigger than 64K... and that's before considering images, videos, large data sets, etc.

32-bit takes you out to 4GB, which is much more likely to hold a typical working set, so the argument to go to 64-bit is much less pressing.

"why not stick with 8 bit?"

This conversation is about the size of the address space, not the machine word size. To my knowledge, there were no serious machines of any sort that were limited to an 8-bit address space. (Maybe something homebrew or embedded.) The closest I can think of is the 6502's preference for putting values in the zero page (which was 256 bytes).

wolfgke · on March 10, 2017

> In a 16-bit address space (64K), you hit the 16-bit limit _all the time_.

Depends on the way 16 bit is implemented. For example x86-16 uses segmented memory - enabling adressing of a little bit more (including High Memory Area) than 1 MiB of memory. The Z180 uses as far as I know a MMU (but not completely sure). Another approach that is/was in common use is to use bank switching.

Depending on the kind of algorithm that you use this can make the coding much more complicated or can also be no problem, because the scheme that is used to address more memory than 2^16 bytes fits the algorithm quite natural.

One interesting hack for example when coding in real mode (x86-16) that I read about is rather to use some clever sharing of bits between the segment register value and segment index:

- One scheme is to consider the value of the used segment register as a pointer to a 16 byte block of memory and use the segment index to adress the specific byte in this block (with an option to increase the index "a little bit" if you want to go further)

- Another scheme is to (mostly) use only the 4 highest bits of the segment register (and zero all the other ones).

mschaef · on March 10, 2017

"One scheme is to consider the value of the used segment register as a pointer to a 16 byte block of memory and use the segment index to adress the specific byte in this block (with an option to increase the index "a little bit" if you want to go further)"

This only works in real mode, where the segment is shifted and directly added to the offset. In protected mode, it goes through a selector table. This can be made to work too, but it requires tiled allocations of segments with known delta between each segment. This is what __ahincr was about, if you remember it from the Win16 days.

wolfgke · on March 10, 2017

> This only works in real mode, where the segment is shifted and directly added to the offset. In protected mode, it goes through a selector table.

Of course this is true. You wrote further above:

> In a 16-bit address space (64K), you hit the 16-bit limit _all the time_.

I wanted to outline that whether this is a problem or not depends a lot on the concrete 16 bit architecture.

mschaef · on March 10, 2017

Are you thinking of a specific concrete 16-bit architecture where it's easy to access data objects larger than 64K?

I'd be interested to hears... I really can't. Probably the most capable architecture I'm familiar with that has a native 16-bit pointer type is the 80286, which provides 24-bit physical addressing, virtualizatoin, protections, etc. Even then, at least on Windows, key local heaps within the OS were confined to 64K, the default text editor was confined to a segment... writing image processing tools (which I did) required special handling for all but the smallest scale workloads. These all added to developer workload and reduced system capacity in unfortunate ways.

I get what you're saying that there are exceptions and hacks that make it possible to work within these limitations, but my point is that you have to care about them on 16-bit, and most of the time you really don't on 32-bit. My thesis for why that is goes back to what I was saying initially about the size of the core data types people tend to manipulate.

(And this goes back to the reason I posted in the first place, which was to explain the relative difference in motivation between the 16->32 switch and the 32->64 switch.)

wolfgke · on March 10, 2017

> Are you thinking of a specific concrete 16-bit architecture where it's easy to access data objects larger than 64K?

Your question implies that you want one piece > 64 KiB of flat memory. With this you already stated a very strong implicit assumption about the data layout and the kind of algorithm that you want to use. My point rather is: Consider the capabilities that the 16b machine has and try to fit the data representation and algorithms that you use around it instead of wining about lack of machine capabilities. One will often find a solution using clever tricks that one would not have considered otherwise, which will often turn out to be surprisingly elegant and much better than the "naive" solution.

This way to program is of course nothing for the kind of programmer that want to write an unelegant and just working program in a short amount of time, I know. :-)

mschaef · on March 10, 2017

> Consider the capabilities that the 16b machine ...instead of wining about lack of machine capabilities. ...the kind of programmer that want to write an unelegant and just working program in a short amount of time,

I do hear you, but one of the things I love about virtually all modern hardware is just how much capability it puts within the reach of totally naive and quick development strategies. If I can inelegantly solve two or three problems in the same time I can elegantly solve one, then that strikes me as a net win (at least for the people that need problems solved more than code written.)

I don't mind pushing the hardware and searching for elegance, but I'd rather be forced into it by the necessities of the problem I'm trying to solve.

oobey · on Aug 23, 2013

I agree, I think this would help reduce a lot of clutter, and I think it is a feature that should be standard in comment systems.