rjnw
2018-6-29 12:26:15

llvm 8700 and augur 100


ccshan
2018-6-29 16:40:40

Wow, we’re fast. So, time to debug (compare with Haskell backend) for correctness?


samth
2018-6-29 19:19:53

is our likelihood better or worse in that plot?


rjnw
2018-6-29 19:36:11

I think it’s worse. Based on the direction of augur, I am assuming it’s improving.


ccshan
2018-6-29 20:12:22

Yes, we want likelihood to improve over time.


ccshan
2018-6-29 20:38:42

So, time to debug (compare with Haskell backend) for correctness?


rjnw
2018-6-29 20:55:56

I have been comparing with haskell all day, so far no errors


ccshan
2018-6-29 20:57:28

Does our likelihood change over time at all? What does it look like when plotted alone? It looks so flat horizontal from here…



rjnw
2018-6-29 21:00:28

rkt has our likelihood


rjnw
2018-6-29 21:02:11

ccshan
2018-6-29 21:16:46

So have you gotten to the point where the probabilities (i.e., array input to categorical) computed by the Haskell backend match those computed by the LLVM backend?


rjnw
2018-6-29 21:18:53

yes, I found something. I am running the trial now to see.


carette
2018-6-29 21:29:30

I have finally updated the version on the arxiv to fix the issues that had been reported to us. Pushed the tweaks to the ppaml repo as well.


carette
2018-6-29 21:29:34

Of course, we


carette
2018-6-29 21:29:38

[oops]


carette
2018-6-29 21:29:59

Of course, we’ll probably undo some of those for POPL submission… but that’s ok.


rjnw
2018-6-29 22:29:40

Okay, after looking at the compiler all day I found out that the issue was in the outer loop. I was not iterating over all the words. It was only going through first 20. But now I changed it to the size of words array which is 2435579. Our single update is ~0.1seconds. So a sweep will be around 67hrs. How should I proceed? (augur’s one sweep is 50seconds)


ccshan
2018-6-29 22:37:21

Can you run our updates for a few minutes (for either the first few thousand words or a spread-out subset of the words) and see if the log likelihood improves over those minutes?


ccshan
2018-6-29 22:37:56

Can you check the augurv2 generated code to see if it ever subtracts from computed counts as it updates?


rjnw
2018-6-29 22:45:46

rjnw
2018-6-29 22:46:24

I only see one loop for updating over words. There is no internal loop the way we have.


ccshan
2018-6-29 22:47:58

I’d expect augurv2 generated code for updating theta and phi (which we integrate out and they do not) to contain loops.


ccshan
2018-6-29 22:48:39

But I’m just wondering whether any statistics is subtracted from when an element of z is updated.


rjnw
2018-6-29 22:49:36

they only update theta and phi once and then do the full sweep


ccshan
2018-6-29 22:50:30

I agree. (And they update each element of phi.)


rjnw
2018-6-29 22:50:32

so they have three linear loops of 2M we have nested loop 2M*2M


rjnw
2018-6-29 22:52:33

So a question in LDA z is supposed to be of word size right? Not total words which is around 60000


rjnw
2018-6-29 22:52:43

and same with word_prior


ccshan
2018-6-29 22:53:13

word_prior is supposed to be as long as the number of unique words. z is supposed to be as long as the size of the corpus.



rjnw
2018-6-29 22:53:57

okay I using both word_prior and z as size of corpus


ccshan
2018-6-29 22:56:51

Ok well you should definitely shorten word_prior


rjnw
2018-6-29 22:57:42

Trying again with correct numbers, I was using number of words before but then I changed it to size of whole corpus while trying to figure out the issue.


rjnw
2018-6-29 22:58:31

but our complexity is still 2M*2M


ccshan
2018-6-29 22:59:53

I mean, a few hours is better than 67hrs. And a few min is better than a few hours. I’d still try the per-update (as opposed to per-sweep) plot for a run of a few min.


ccshan
2018-6-29 23:00:29

My next idea was going to be not integrating out theta and phi. Maybe we can generate such a Gibbs sampler quickly…


rjnw
2018-6-29 23:02:23

Okay I am going to see if we can plot per update


ccshan
2018-6-29 23:04:01

I hope the log-likelihood computation is reasonably fast?


rjnw
2018-6-29 23:07:08

I am going to take a snapshot every 1000th update


ccshan
2018-6-29 23:07:34

ok that sounds like every 10 seconds to me


rjnw
2018-6-29 23:07:46

rjnw
2018-6-29 23:08:02

it’s still at the same level compared to augur


ccshan
2018-6-29 23:08:03

Woohoo it goes up!


ccshan
2018-6-29 23:09:59

I don’t see that we’re not competitive with AugurV2. I’d keep it running for 90 minutes as I see you did AugurV2.


ccshan
2018-6-29 23:12:30

(ps the x axis should start at 0 seconds)


rjnw
2018-6-30 00:25:37

ccshan
2018-6-30 00:50:30

Hmm, has 40000 seconds already passed? Anyway, I worked out the Gibbs updates for not integrating out theta and phi. https://github.com/rjnw/hakaru-benchmarks/commit/2948ec7a61dda373792c97d04f1bb911bb7a6720


rjnw
2018-6-30 01:40:59

there is something wrong with those axis


rjnw
2018-6-30 01:45:42

rjnw
2018-6-30 01:46:52

I only did until 100,000 words, so if I do a full sweep our first point will be higher than augur


rjnw
2018-6-30 01:48:05

this is with snapshot every 1000 word. I will run it for 5000 now


ccshan
2018-6-30 01:48:27

Wait, why do you say “if I do a full sweep our first point will be higher than augur”? I don’t see that in this plot.


rjnw
2018-6-30 01:51:28

In the plot for z I only ran until 100,000 whereas a full sweep is ~2.4M where as the starting point of augur line is after first full sweep. So if we extrapolate our line 20 times that should atleast be higher than augur’s starting point.


rjnw
2018-6-30 01:52:28

@ccshan for lda without integrating theta and phi. I calculate theta and phi once and then run a full sweep using _Z?


ccshan
2018-6-30 01:53:40

By “plot for z” do you mean “plot for llvm”? Ok, if our line is straight…


ccshan
2018-6-30 01:54:36

rjnw
2018-6-30 01:54:39

oh yes plot for llvm.


ccshan
2018-6-30 01:56:56

To be clear, to calculate theta and phi, you should feed the current z to _ThetaPhi. Then to sweep through z, the new theta and phi would be fed to _Z. Note that _Z is so simple that you might need to adjust unsample to handle the lack of weight, or just not use unsample.


rjnw
2018-6-30 01:59:03

Okay, let me try.


rjnw
2018-6-30 02:03:03
../../hkbin/hk-maple -p sexpression --timelimit=600 ../hksrc/LdaGibbs_ThetaPhi.hk > LdaGibbs_ThetaPhi.hkr
hk-maple: primCoerceFrom@Literal: negative HInt -1
CallStack (from HasCallStack):
  error, called at haskell/Language/Hakaru/Syntax/AST.hs:178:24 in hakaru-0.6.0-AbnzUW5EnqjBuQmLn3LJbU:Language.Hakaru.Syntax.AST
make: *** [Makefile:4: LdaGibbs_ThetaPhi.hkr] Error 1

ccshan
2018-6-30 02:05:45

Would you please call hk-maple with --debug please?


rjnw
2018-6-30 02:08:43

rjnw
2018-6-30 02:09:06

shared in group as a snippet


ccshan
2018-6-30 02:11:06

@ccshan commented on @rjnw’s file <https://racket.slack.com/files/U6602H150/FBG2XMUEL/ldagibbs_thetaphi_—debug.m|LdaGibbs_ThetaPhi —debug>: It looks like you haven’t removed value from NewSLO/Interface.mpl yet, but I don’t think that’s causing the problem.


rjnw
2018-6-30 02:13:11

Oh yeah I forgot, doing it now


rjnw
2018-6-30 02:15:12
Sent to Maple:
use Hakaru, NewSLO in timelimit(600, RoundTrip(lam(`topic_prior`, HArray(HReal(Bound(`&gt;=`,0))), lam(`word_prior`, HArray(HReal(Bound(`&gt;=`,0))), lam(`numDocs`, HInt(Bound(`&gt;=`,0)), lam(`w`, HArray(HInt(Bound(`&gt;=`,0))), lam(`doc`, HArray(HInt(Bound(`&gt;=`,0))), lam(`z`, HArray(HInt(Bound(`&gt;=`,0))), Msum(Weight((Product(Product(Product((`j` + idx(`word_prior`, `iB`)), `j`=0..(Sum(case(And((`iB` = idx(`w`, `dL`)), (`d` = idx(`z`, `dL`))), Branches(Branch(PDatum(true, PInl(PDone)), 1), Branch(PDatum(false, PInr(PInl(PDone))), 0))), `dL`=0..(size(`w`))-1))-1), `iB`=0..(size(`word_prior`))-1), `d`=0..(size(`topic_prior`))-1) * Product(Product(Product((`j` + idx(`topic_prior`, `i12`)), `j`=0..(Sum(case(And((`d` = idx(`doc`, `dL`)), (`i12` = idx(`z`, `dL`))), Branches(Branch(PDatum(true, PInl(PDone)), 1), Branch(PDatum(false, PInr(PInl(PDone))), 0))), `dL`=0..(size(`w`))-1))-1), `i12`=0..(size(`topic_prior`))-1), `d`=0..(`numDocs`)-1) * 1/(Product(Product((`i12` + Sum(idx(`topic_prior`, `dL`), `dL`=0..(size(`topic_prior`))-1)), `i12`=0..(Sum(case((`d` = idx(`doc`, `dL`)), Branches(Branch(PDatum(true, PInl(PDone)), 1), Branch(PDatum(false, PInr(PInl(PDone))), 0))), `dL`=0..(size(`w`))-1))-1), `d`=0..(`numDocs`)-1)) * 1/(Product(Product((`iB` + Sum(idx(`word_prior`, `dL`), `dL`=0..(size(`word_prior`))-1)), `iB`=0..(Sum(case((`d` = idx(`z`, `dL`)), Branches(Branch(PDatum(true, PInl(PDone)), 1), Branch(PDatum(false, PInr(PInl(PDone))), 0))), `dL`=0..(size(`w`))-1))-1), `d`=0..(size(`topic_prior`))-1))), hk-maple: primCoerceFrom@Literal: negative HInt -1
CallStack (from HasCallStack):
  error, called at haskell/Language/Hakaru/Syntax/AST.hs:178:24 in hakaru-0.6.0-AbnzUW5EnqjBuQmLn3LJbU:Language.Hakaru.Syntax.AST

without value


rjnw
2018-6-30 03:29:42

augur has 5 sweeps, hakaru barely one full sweep, data points at every 10000 update. words-size: 353160, num-docs: 3431, num-words: 6907, num-topics: 50


rjnw
2018-6-30 03:30:31

I can run nips too, but it’s size is similar to 20newsgroup


rjnw
2018-6-30 03:39:03

correction: hakaru almost one full sweep last snapshot is at 350,000th update


ccshan
2018-6-30 03:50:00

Wow, cool. So you switched to a smaller data set and maybe integrating out theta and phi makes sense for this dataset because AugurV2 may be surpassed in log-likelihood by Hakaru LLVM in 7 minutes. Do I understand correctly? (I’m still debugging the “negative HInt –1” problem you encountered, and making progress.)


rjnw
2018-6-30 03:53:04

Yes, also if we compare accuracy per sweep then Hakaru is a lot better from the first sweep.


rjnw
2018-6-30 03:53:33

How long should I run this to see the lines intersect?


ccshan
2018-6-30 03:54:01

If you’ve run Hakaru LLVM for 400–500 seconds then the same number of seconds seems a good duration for AugurV2 :slightly_smiling_face:


rjnw
2018-6-30 05:10:23

ccshan
2018-6-30 05:11:01

Wait what? Is the x axis seconds or sweeps or…?


rjnw
2018-6-30 05:11:08

seconds


rjnw
2018-6-30 05:12:06

the snapshots are taking differently though, hakaru every 10,000 updates plus every sweep and augur every sweep


ccshan
2018-6-30 05:14:21

So, I just pushed a change to the typechecker (blush). Please pull and try on _ThetaPhi and _Z?


rjnw
2018-6-30 05:15:56

I will take a look at it tomorrow, what do you think of the above graph?


ccshan
2018-6-30 05:17:10

Oh this is 50-kos?


ccshan
2018-6-30 05:17:19

I mean this is the kos–50 dataset?


rjnw
2018-6-30 05:17:27

yes


ccshan
2018-6-30 05:17:43

Ah that makes sense. I was confused by our winning :stuck_out_tongue:


ccshan
2018-6-30 05:18:04

Winning or not, I think it’s definitely informative and should be included and explained in the paper.


rjnw
2018-6-30 05:18:11

well in this we were able to do a couple sweeps


rjnw
2018-6-30 05:18:18

I would say around 3 or 4


rjnw
2018-6-30 05:35:30

@ccshan LdaGibbs_ThetaPhi’s compilation works now. The only issue is there is stuff in there which I didn’t implement in llvm yet. :expressionless:


ccshan
2018-6-30 05:35:47

Really? Like what operation? (Note that weights don’t matter here)


rjnw
2018-6-30 05:37:22

plate


ccshan
2018-6-30 05:38:07

Oh!


rjnw
2018-6-30 05:38:51

but it’s implementation shouldn’t be much effort most of the stuff is already there. like for


ccshan
2018-6-30 05:39:20

Well, I hope you can overwrite a pre-allocated block of memory


rjnw
2018-6-30 05:40:45

That’s something I will figure out. We already do that with most of the memory. I don’t think we do allocation after compilation in any of the benchmarks we have right now.


rjnw
2018-6-30 05:43:28

What is summarize in here https://github.com/rjnw/hakaru-benchmarks/blob/master/testcode/hssrc/LdaGibbs_ThetaPhi.hs ? I don’t see it in LogFloatPrelude


ccshan
2018-6-30 05:46:23

That shouldn’t be there… summarize is supposed to be executed by hk-maple; it means the histogram optimization