
Currently, according to https://docs.racket-lang.org/reference/characters.html and my tests, we are not supporting surrogate pairs in unicode. Why is this? Nobody implemented support yet, the underlying unicode library we use doesn’t support them or was it some key decision in Racket’s design?

So I cannot for example create a string in Racket with U+1D400 MATHEMATICAL BOLD CAPITAL A

which uses the surrogate pair #\uD835 #\uDC00

the reader chokes at #\ud835

𝐀 - it seems that slack supports them. :slightly_smiling_face:

@pocmatos Surrogate pairs are fundamentally a part of the UTF–16 encoding scheme—the code points reserved for them are not actually valid Unicode code points. Racket does not use UTF–16, so there is no reason to use surrogate pairs; use the actual code point the surrogate pair would encode instead, in that case #\U1D400
.

Racket is correct to reject #\uD835
because U+D835 is not a valid character.

If you need to read or write UTF–16 encoded strings, use bytes-open-converter
: https://docs.racket-lang.org/reference/bytestrings.html#%28def._%28%28quote._~23~25kernel%29._bytes-open-converter%29%29 But a Racket character, in the char?
sense, always represents exactly one Unicode code point, and a surrogate is only half a code point.

Ah, of course, racket uses UTF8 - my bad. :slightly_smiling_face: Trying to fix a jsc unicode bug tempted me to try it in racket but I forgot encodings were different.

Racket actually uses UCS–4/UTF–32 internally, but that’s an implementation detail; it could use UTF–16 and still preserve the current interface (though there would be no reason to do so).

Which libraries do we use to support these?

We don’t; Racket ships its own support for Unicode, generated from the official Unicode data files.

That also means Racket’s support for operations on strings is fairly limited. There is no way to do string normalization or to calculate the number of discrete, renderable glyphs in a string, for example. You only get code points.

Oh! :slightly_smiling_face: Interesting - certainly something that could be improved if people wanted. But generating stuff directly from the Unicode data files is cool.

I think ICU bindings are usually shipped as separate packages in most ecosystems, mostly because there’s a lot of API complexity there that most code doesn’t need, anyway. But I don’t think any such bindings currently exist for Racket.

Thanks.

@lexi.lambda Racket does have normalization procedures, see string-normalize-nfc et al

@samth I think there is
https://github.com/eu90h/racket-github-api
I’ve collected the other OAuth efforts:
https://github.com/racket/racket/wiki/Web-Development#auth-tools
JWT is in there too because it is used in OpenID Connect
@soegaard2

That library is for using the github API; I’m talking about using github oauth login to as a login service for another site


@samth Thanks, I didn’t know about those (or perhaps forgot about them). I think the broader point is still relevant, though: no sophisticated locale handling, no collation, no support for accessing grapheme clusters, etc.

Yes, I agree

What would be the best way to integrate “Log in with Github”?
Option 1: Every user creates a standard user with name+password. When logged-in the user can link a github account (by signing in with Github). We now have both a racket-stories username and a github username. If the user logs out, he can login later with Github directly.
Option 2: A user can create an account by logging in with Github. The Github username will also become his racket-stories username. Problem: What if another non-github user already has the username in question?
Option 3: ?

Option 3: A user can create an account by logging in with Github. Their password will stay the same by default and they will have to pick a new username. The new username can be the same as the Github username and will only be given to them if it’s not already taken.

There is a good reason many services just use email addresses as user names, uniqueness.

The best way to handle user accounts is to separate three distinct concepts that are often muddled: user id, authentication, and display name. Use a synthetic, randomly generated, immutable id for user identity, and let users pick whatever string they want for their display name and change it whenever. Associate one or more authentication methods with each user—it could be email + password combo, OAuth provider like Google or GitHub, or something else—and let authenticated users alter those at will.