
@d.zivertas has joined the channel

Has there been any discussion of extending Racket’s set of Unicode property-inspecting procedures? For example, there are already procedures for inspecting the General Category of a character, but there are many other properties that are not covered. Although there is the <https://docs.racket-lang.org/unicode/index.html?q=unicode|Unicode Chars> package, I think that this functionality has a place in the stdlib, since property analysis is a critical part of text-processing algorithms and Unicode-aware parsing libraries.

I don’t think there’s been much discussion of it, but that seems like a good idea

I’ll open a feature request so there’s a standing place for discussion.

My lang has a serious issue in that it has no I/O, currently. The contracts needed to mediate I/O don’t exist, yet, but I think that a basic suite of procedures for working with ports will be good.

I think that my lang ought to be a port type that is dedicated to UTF–8 text, since part of Racket’s port API seems to assume (but not enforce) that a given port is text-only.

That’s not true

I’m sorry if I misunderstood; I got this impression from what the docs say in <https://docs.racket-lang.org/reference/linecol.html?q=port-count-lines!|the section on counting lines and columns>.

Well, I guess it depends on what you mean, exactly. You can mix byte- and character-based functions on the same port, though it’s fairly rare that you would do so.

(Whether or not that’s a good design is a separate matter.)

I suppose that I should simplify my question: should my lang have dedicated text-only ports?

I think eventually you’ll need some way of drawing either binary data or character data from the same ultimate source at its given position. It’s not the norm, but it’s pretty important. In Racket, you just switch which functions you’re using. In some languages, you might layer a character stream on top of a byte stream. Or you might have something like a file descriptor (which has a current position) and be able to use it as either a source of bytes or a source of characters.
But if you separate binary i/o and text i/o completely, without any way of switching “modes,” I think that limits you eventually.

The procedure port-count-lines!
cannot be “turned off”, which I think means that, at least the position counting is stuck in text-mode, for the rest of the file.

Yeah — I guess that’s true. But it also looks like, when you turn it on, you don’t get errors when you read individual bytes, even when they don’t constitute valid utf–8 characters. And the column count still seems to go up.

I could make a custom port type that allows turning off UTF–8 line+column counting.