Racket Slack Archive

rjnw

2018-6-29 12:26:15

llvm 8700 and augur 100

ccshan

2018-6-29 16:40:40

Wow, we’re fast. So, time to debug (compare with Haskell backend) for correctness?

samth

2018-6-29 19:19:53

is our likelihood better or worse in that plot?

rjnw

2018-6-29 19:36:11

I think it’s worse. Based on the direction of augur, I am assuming it’s improving.

ccshan

2018-6-29 20:12:22

Yes, we want likelihood to improve over time.

ccshan

2018-6-29 20:38:42

So, time to debug (compare with Haskell backend) for correctness?

rjnw

2018-6-29 20:55:56

I have been comparing with haskell all day, so far no errors

ccshan

2018-6-29 20:57:28

Does our likelihood change over time at all? What does it look like when plotted alone? It looks so flat horizontal from here…

rjnw

2018-6-29 21:00:07

very little, https://github.com/rjnw/hakaru-benchmarks/tree/master/output/accuracies/LdaGibbs

rjnw

2018-6-29 21:00:28

rkt has our likelihood

rjnw

2018-6-29 21:02:11

ccshan

2018-6-29 21:16:46

So have you gotten to the point where the probabilities (i.e., array input to categorical) computed by the Haskell backend match those computed by the LLVM backend?

rjnw

2018-6-29 21:18:53

yes, I found something. I am running the trial now to see.

carette

2018-6-29 21:29:30

I have finally updated the version on the arxiv to fix the issues that had been reported to us. Pushed the tweaks to the ppaml repo as well.

carette

2018-6-29 21:29:34

Of course, we

carette

2018-6-29 21:29:38

[oops]

carette

2018-6-29 21:29:59

Of course, we’ll probably undo some of those for POPL submission… but that’s ok.

rjnw

2018-6-29 22:29:40

Okay, after looking at the compiler all day I found out that the issue was in the outer loop. I was not iterating over all the words. It was only going through first 20. But now I changed it to the size of words array which is 2435579. Our single update is ~0.1seconds. So a sweep will be around 67hrs. How should I proceed? (augur’s one sweep is 50seconds)

ccshan

2018-6-29 22:37:21

Can you run our updates for a few minutes (for either the first few thousand words or a spread-out subset of the words) and see if the log likelihood improves over those minutes?

ccshan

2018-6-29 22:37:56

Can you check the augurv2 generated code to see if it ever subtracts from computed counts as it updates?

rjnw

2018-6-29 22:45:46

rjnw

2018-6-29 22:46:24

I only see one loop for updating over words. There is no internal loop the way we have.

ccshan

2018-6-29 22:47:58

I’d expect augurv2 generated code for updating theta and phi (which we integrate out and they do not) to contain loops.

ccshan

2018-6-29 22:48:39

But I’m just wondering whether any statistics is subtracted from when an element of z is updated.

rjnw

2018-6-29 22:49:36

they only update theta and phi once and then do the full sweep

ccshan

2018-6-29 22:50:30

I agree. (And they update each element of phi.)

rjnw

2018-6-29 22:50:32

so they have three linear loops of 2M we have nested loop 2M*2M

rjnw

2018-6-29 22:52:33

So a question in LDA z is supposed to be of word size right? Not total words which is around 60000

rjnw

2018-6-29 22:52:43

and same with word_prior

ccshan

2018-6-29 22:53:13

word_prior is supposed to be as long as the number of unique words. z is supposed to be as long as the size of the corpus.

rjnw

2018-6-29 22:53:33

https://github.com/hakaru-dev/hakaru-rktjit/blob/master/test/LDA/hs/Main.hs

rjnw

2018-6-29 22:53:57

okay I using both word_prior and z as size of corpus

ccshan

2018-6-29 22:56:51

Ok well you should definitely shorten word_prior

rjnw

2018-6-29 22:57:42

Trying again with correct numbers, I was using number of words before but then I changed it to size of whole corpus while trying to figure out the issue.

rjnw

2018-6-29 22:58:31

but our complexity is still 2M*2M

ccshan

2018-6-29 22:59:53

I mean, a few hours is better than 67hrs. And a few min is better than a few hours. I’d still try the per-update (as opposed to per-sweep) plot for a run of a few min.

ccshan

2018-6-29 23:00:29

My next idea was going to be not integrating out theta and phi. Maybe we can generate such a Gibbs sampler quickly…

rjnw

2018-6-29 23:02:23

Okay I am going to see if we can plot per update

ccshan

2018-6-29 23:04:01

I hope the log-likelihood computation is reasonably fast?

rjnw

2018-6-29 23:07:08

I am going to take a snapshot every 1000th update

ccshan

2018-6-29 23:07:34

ok that sounds like every 10 seconds to me

rjnw

2018-6-29 23:07:46

rjnw

2018-6-29 23:08:02

it’s still at the same level compared to augur

ccshan

2018-6-29 23:08:03

Woohoo it goes up!

ccshan

2018-6-29 23:09:59

I don’t see that we’re not competitive with AugurV2. I’d keep it running for 90 minutes as I see you did AugurV2.

ccshan

2018-6-29 23:12:30

(ps the x axis should start at 0 seconds)

rjnw

2018-6-30 00:25:37

ccshan

2018-6-30 00:50:30

Hmm, has 40000 seconds already passed? Anyway, I worked out the Gibbs updates for not integrating out theta and phi. https://github.com/rjnw/hakaru-benchmarks/commit/2948ec7a61dda373792c97d04f1bb911bb7a6720

rjnw

2018-6-30 01:40:59

there is something wrong with those axis

rjnw

2018-6-30 01:45:42

rjnw

2018-6-30 01:46:52

I only did until 100,000 words, so if I do a full sweep our first point will be higher than augur

rjnw

2018-6-30 01:48:05

this is with snapshot every 1000 word. I will run it for 5000 now

ccshan

2018-6-30 01:48:27

Wait, why do you say “if I do a full sweep our first point will be higher than augur”? I don’t see that in this plot.

rjnw

2018-6-30 01:51:28

In the plot for z I only ran until 100,000 whereas a full sweep is ~2.4M where as the starting point of augur line is after first full sweep. So if we extrapolate our line 20 times that should atleast be higher than augur’s starting point.

rjnw

2018-6-30 01:52:28

@ccshan for lda without integrating theta and phi. I calculate theta and phi once and then run a full sweep using _Z?

ccshan

2018-6-30 01:53:40

By “plot for z” do you mean “plot for llvm”? Ok, if our line is straight…

ccshan

2018-6-30 01:54:36

Yes. Note that I just pushed an update to _Z: https://github.com/rjnw/hakaru-benchmarks/commit/e6d9c20093d69d1cf3b52a3181d6021416a576bc

rjnw

2018-6-30 01:54:39

oh yes plot for llvm.

ccshan

2018-6-30 01:56:56

To be clear, to calculate theta and phi, you should feed the current z to _ThetaPhi. Then to sweep through z, the new theta and phi would be fed to _Z. Note that _Z is so simple that you might need to adjust unsample to handle the lack of weight, or just not use unsample.

rjnw

2018-6-30 01:59:03

Okay, let me try.

rjnw

2018-6-30 02:03:03

../../hkbin/hk-maple -p sexpression --timelimit=600 ../hksrc/LdaGibbs_ThetaPhi.hk &gt; LdaGibbs_ThetaPhi.hkr
hk-maple: primCoerceFrom@Literal: negative HInt -1
CallStack (from HasCallStack):
  error, called at haskell/Language/Hakaru/Syntax/AST.hs:178:24 in hakaru-0.6.0-AbnzUW5EnqjBuQmLn3LJbU:Language.Hakaru.Syntax.AST
make: *** [Makefile:4: LdaGibbs_ThetaPhi.hkr] Error 1

ccshan

2018-6-30 02:05:45

Would you please call hk-maple with --debug please?

rjnw

2018-6-30 02:08:43

rjnw

2018-6-30 02:09:06

shared in group as a snippet

ccshan

2018-6-30 02:11:06

@ccshan commented on @rjnw’s file <https://racket.slack.com/files/U6602H150/FBG2XMUEL/ldagibbs_thetaphi_—debug.m|LdaGibbs_ThetaPhi —debug>: It looks like you haven’t removed value from NewSLO/Interface.mpl yet, but I don’t think that’s causing the problem.

rjnw

2018-6-30 02:13:11

Oh yeah I forgot, doing it now

rjnw

2018-6-30 02:15:12

Sent to Maple:
use Hakaru, NewSLO in timelimit(600, RoundTrip(lam(`topic_prior`, HArray(HReal(Bound(`&gt;=`,0))), lam(`word_prior`, HArray(HReal(Bound(`&gt;=`,0))), lam(`numDocs`, HInt(Bound(`&gt;=`,0)), lam(`w`, HArray(HInt(Bound(`&gt;=`,0))), lam(`doc`, HArray(HInt(Bound(`&gt;=`,0))), lam(`z`, HArray(HInt(Bound(`&gt;=`,0))), Msum(Weight((Product(Product(Product((`j` + idx(`word_prior`, `iB`)), `j`=0..(Sum(case(And((`iB` = idx(`w`, `dL`)), (`d` = idx(`z`, `dL`))), Branches(Branch(PDatum(true, PInl(PDone)), 1), Branch(PDatum(false, PInr(PInl(PDone))), 0))), `dL`=0..(size(`w`))-1))-1), `iB`=0..(size(`word_prior`))-1), `d`=0..(size(`topic_prior`))-1) * Product(Product(Product((`j` + idx(`topic_prior`, `i12`)), `j`=0..(Sum(case(And((`d` = idx(`doc`, `dL`)), (`i12` = idx(`z`, `dL`))), Branches(Branch(PDatum(true, PInl(PDone)), 1), Branch(PDatum(false, PInr(PInl(PDone))), 0))), `dL`=0..(size(`w`))-1))-1), `i12`=0..(size(`topic_prior`))-1), `d`=0..(`numDocs`)-1) * 1/(Product(Product((`i12` + Sum(idx(`topic_prior`, `dL`), `dL`=0..(size(`topic_prior`))-1)), `i12`=0..(Sum(case((`d` = idx(`doc`, `dL`)), Branches(Branch(PDatum(true, PInl(PDone)), 1), Branch(PDatum(false, PInr(PInl(PDone))), 0))), `dL`=0..(size(`w`))-1))-1), `d`=0..(`numDocs`)-1)) * 1/(Product(Product((`iB` + Sum(idx(`word_prior`, `dL`), `dL`=0..(size(`word_prior`))-1)), `iB`=0..(Sum(case((`d` = idx(`z`, `dL`)), Branches(Branch(PDatum(true, PInl(PDone)), 1), Branch(PDatum(false, PInr(PInl(PDone))), 0))), `dL`=0..(size(`w`))-1))-1), `d`=0..(size(`topic_prior`))-1))), hk-maple: primCoerceFrom@Literal: negative HInt -1
CallStack (from HasCallStack):
  error, called at haskell/Language/Hakaru/Syntax/AST.hs:178:24 in hakaru-0.6.0-AbnzUW5EnqjBuQmLn3LJbU:Language.Hakaru.Syntax.AST

without value

rjnw

2018-6-30 03:29:42

augur has 5 sweeps, hakaru barely one full sweep, data points at every 10000 update. words-size: 353160, num-docs: 3431, num-words: 6907, num-topics: 50

rjnw

2018-6-30 03:30:31

I can run nips too, but it’s size is similar to 20newsgroup

rjnw

2018-6-30 03:39:03

correction: hakaru almost one full sweep last snapshot is at 350,000th update

ccshan

2018-6-30 03:50:00

Wow, cool. So you switched to a smaller data set and maybe integrating out theta and phi makes sense for this dataset because AugurV2 may be surpassed in log-likelihood by Hakaru LLVM in 7 minutes. Do I understand correctly? (I’m still debugging the “negative HInt –1” problem you encountered, and making progress.)

rjnw

2018-6-30 03:53:04

Yes, also if we compare accuracy per sweep then Hakaru is a lot better from the first sweep.

rjnw

2018-6-30 03:53:33

How long should I run this to see the lines intersect?

ccshan

2018-6-30 03:54:01

If you’ve run Hakaru LLVM for 400–500 seconds then the same number of seconds seems a good duration for AugurV2 :slightly_smiling_face:

rjnw

2018-6-30 05:10:23

ccshan

2018-6-30 05:11:01

Wait what? Is the x axis seconds or sweeps or…?

rjnw

2018-6-30 05:11:08

seconds

rjnw

2018-6-30 05:12:06

the snapshots are taking differently though, hakaru every 10,000 updates plus every sweep and augur every sweep

ccshan

2018-6-30 05:14:21

So, I just pushed a change to the typechecker (blush). Please pull and try on _ThetaPhi and _Z?

rjnw

2018-6-30 05:15:56

I will take a look at it tomorrow, what do you think of the above graph?

ccshan

2018-6-30 05:17:10

Oh this is 50-kos?

ccshan

2018-6-30 05:17:19

I mean this is the kos–50 dataset?

rjnw

2018-6-30 05:17:27

yes

ccshan

2018-6-30 05:17:43

Ah that makes sense. I was confused by our winning :stuck_out_tongue:

ccshan

2018-6-30 05:18:04

Winning or not, I think it’s definitely informative and should be included and explained in the paper.

rjnw

2018-6-30 05:18:11

well in this we were able to do a couple sweeps

rjnw

2018-6-30 05:18:18

I would say around 3 or 4

rjnw

2018-6-30 05:35:30

@ccshan LdaGibbs_ThetaPhi’s compilation works now. The only issue is there is stuff in there which I didn’t implement in llvm yet. :expressionless:

ccshan

2018-6-30 05:35:47

Really? Like what operation? (Note that weights don’t matter here)

rjnw

2018-6-30 05:37:22

plate

ccshan

2018-6-30 05:38:07

Oh!

rjnw

2018-6-30 05:38:51

but it’s implementation shouldn’t be much effort most of the stuff is already there. like for

ccshan

2018-6-30 05:39:20

Well, I hope you can overwrite a pre-allocated block of memory

rjnw

2018-6-30 05:40:45

That’s something I will figure out. We already do that with most of the memory. I don’t think we do allocation after compilation in any of the benchmarks we have right now.

rjnw

2018-6-30 05:43:28

What is summarize in here https://github.com/rjnw/hakaru-benchmarks/blob/master/testcode/hssrc/LdaGibbs_ThetaPhi.hs ? I don’t see it in LogFloatPrelude

ccshan

2018-6-30 05:46:23

That shouldn’t be there… summarize is supposed to be executed by hk-maple; it means the histogram optimization