wanpeebaw
2020-7-18 08:57:59

I was solving an exercise on http://exercism.io\|exercism.io. One of the problem involving string matching after a normalize process and I came up with following code snippet. (define (anagrams-for word lst) (define (match? _) (if (string=? word _) #f (equal? (normalize word) (normalize _)))) (filter match? lst)) I wonder if there is any performance difference if I extract (normalize word) in the match? function to a definition like following. (define (anagrams-for word lst) (define normalize-word (normalize word)) (define (match? _) (if (string=? word _) #f (equal? normalize-word (normalize _)))) (filter match? lst))


jcoo092
2020-7-18 09:07:28

~_Generally speaking_ (there might be some Racket specifics I am unaware of), the cost of making the extra function call is so minuscule these days as to be completely irrelevant. After all, CPUs are designed these days bearing in mind that the function is usually the core unit of abstraction used in most modern programming. People would have to have super-duper-optimised code for the overhead of a function call to be an issue - it’s almost guaranteed that something else will be a MUCH bigger bottleneck. Plus, it is always possible that the compiler could inline the function, and thus the two end up being completely identical.~

~Ultimately, though, you would have to perform suitable measurements to identify what difference, if any, there is. My suspicion is that to get any clear result where the difference isn’t potentially inside the margin of error you would probably have to run the functions repeatedly an unrealistic number of times.~

~So, my advice is, if an extra function call is the only difference, just make it happen if you think it will significantly improve the code~ :slightly_smiling_face: ~Only worry about the overhead of a function call once you have identified that it is the bottleneck. Though, in this specific example, I probably wouldn’t bother since I don’t feel it really makes much of an improvement (that’s a subjective aspect, though).~

The above was a rant unrelated the actual question asked. Nothing to see here :sweat_smile:


soegaard2
2020-7-18 09:10:10

Since match? is called for each element in the list lst the cost is proportional to the length of lst. I would use the last version - at least if I don’t know all the lists are short.


jcoo092
2020-7-18 09:13:02

Oh, whoops, I misread the code. I had thought that the same function call was happening every single time in the second example :sweat_smile:


jcoo092
2020-7-18 09:14:33

Yeah, now that I see that normalize-word is a fixed variable in the second one, then I definitely would want to go with the second version. Especially if I’m expecting a longer list.


sorawee
2020-7-18 09:37:52

FWIW, it’s unusual to use a variable named _. People usually use it to indicate that it’s a wildcard (that is you don’t care to use its value). If you want to use its value, I would suggest to name it properly.


wanpeebaw
2020-7-18 11:49:47

@soegaard2 So the optimizer will not find that this part is an invariant to match?, and then optimize it, right?


sorawee
2020-7-18 15:49:27

It is unlikely that the optimizer will be able to lift (normalize word) up automatically. One problem is this lifting would not be sound if normalize is not pure, and it’s very hard (and probably even undecidable) to figure out if a function is pure or not. Though, of course, it’s possible to come up with various heuristics to try to find out if a function is pure.


wanpeebaw
2020-7-18 16:28:12

I want to mimic an anonymous parameter, just couldn’t find a more suitable character for this purpose.


wanpeebaw
2020-7-18 16:29:14

So it could be optimize if the optimizer knows that normalize is a pure function? The actual function definition is here. I think it’s a pure function. (define (normalize str) (sort (string->list (string-downcase str)) char<?))


sorawee
2020-7-18 16:30:12

Well, it’s a prerequisite, but there’s also a lot of other factors. I just mentioned this to show that it’s not straightforward at all


sorawee
2020-7-18 16:36:30

I don’t know what “anonymous parameter” means.


kellysmith12.21
2020-7-18 18:49:11

Out of curiosity: what would be a reasonable use-case for doing macro-related things at a phase level higher than 1?


soegaard2
2020-7-18 18:51:50

If the implementation of your macro (syntax transformer) uses somer other macro (say cond, case or match), then that syntax transformer is run at phase 2.


soegaard2
2020-7-18 18:52:45

A more direct example: If you need to define a series of similar macros, then you can use a phase 2 macro to create phase 1 macros.


notjack
2020-7-18 22:12:56

syntax classes can sometimes require that


notjack
2020-7-18 22:13:17

like if you want to make a macro that automates defining a bunch of similar syntax classes


badkins
2020-7-19 00:18:26

@wanpeebaw you may be interested in the http://exercism.io\|exercism.io examples in my https://github.com/lojic/LearningRacket\|LearningRacket github repo