
llvm 8700 and augur 100

Wow, we’re fast. So, time to debug (compare with Haskell backend) for correctness?

is our likelihood better or worse in that plot?

I think it’s worse. Based on the direction of augur, I am assuming it’s improving.

Yes, we want likelihood to improve over time.

So, time to debug (compare with Haskell backend) for correctness?

I have been comparing with haskell all day, so far no errors

Does our likelihood change over time at all? What does it look like when plotted alone? It looks so flat horizontal from here…


rkt
has our likelihood


So have you gotten to the point where the probabilities (i.e., array input to categorical) computed by the Haskell backend match those computed by the LLVM backend?

yes, I found something. I am running the trial now to see.

I have finally updated the version on the arxiv to fix the issues that had been reported to us. Pushed the tweaks to the ppaml repo as well.

Of course, we

[oops]

Of course, we’ll probably undo some of those for POPL submission… but that’s ok.

Okay, after looking at the compiler all day I found out that the issue was in the outer loop. I was not iterating over all the words. It was only going through first 20. But now I changed it to the size of words array which is 2435579. Our single update is ~0.1seconds. So a sweep will be around 67hrs. How should I proceed? (augur’s one sweep is 50seconds)

Can you run our updates for a few minutes (for either the first few thousand words or a spread-out subset of the words) and see if the log likelihood improves over those minutes?

Can you check the augurv2 generated code to see if it ever subtracts from computed counts as it updates?


I only see one loop for updating over words. There is no internal loop the way we have.

I’d expect augurv2 generated code for updating theta and phi (which we integrate out and they do not) to contain loops.

But I’m just wondering whether any statistics is subtracted from when an element of z is updated.

they only update theta and phi once and then do the full sweep

I agree. (And they update each element of phi.)

so they have three linear loops of 2M we have nested loop 2M*2M

So a question in LDA z
is supposed to be of word size right? Not total words which is around 60000

and same with word_prior

word_prior is supposed to be as long as the number of unique words. z is supposed to be as long as the size of the corpus.


okay I using both word_prior and z as size of corpus

Ok well you should definitely shorten word_prior

Trying again with correct numbers, I was using number of words before but then I changed it to size of whole corpus while trying to figure out the issue.

but our complexity is still 2M*2M

I mean, a few hours is better than 67hrs. And a few min is better than a few hours. I’d still try the per-update (as opposed to per-sweep) plot for a run of a few min.

My next idea was going to be not integrating out theta and phi. Maybe we can generate such a Gibbs sampler quickly…

Okay I am going to see if we can plot per update

I hope the log-likelihood computation is reasonably fast?

I am going to take a snapshot every 1000th update

ok that sounds like every 10 seconds to me


it’s still at the same level compared to augur

Woohoo it goes up!

I don’t see that we’re not competitive with AugurV2. I’d keep it running for 90 minutes as I see you did AugurV2.

(ps the x axis should start at 0 seconds)


Hmm, has 40000 seconds already passed? Anyway, I worked out the Gibbs updates for not integrating out theta and phi. https://github.com/rjnw/hakaru-benchmarks/commit/2948ec7a61dda373792c97d04f1bb911bb7a6720

there is something wrong with those axis


I only did until 100,000 words, so if I do a full sweep our first point will be higher than augur

this is with snapshot every 1000 word. I will run it for 5000 now

Wait, why do you say “if I do a full sweep our first point will be higher than augur”? I don’t see that in this plot.

In the plot for z I only ran until 100,000 whereas a full sweep is ~2.4M where as the starting point of augur line is after first full sweep. So if we extrapolate our line 20 times that should atleast be higher than augur’s starting point.

@ccshan for lda without integrating theta and phi. I calculate theta and phi once and then run a full sweep using _Z
?

By “plot for z” do you mean “plot for llvm”? Ok, if our line is straight…

Yes. Note that I just pushed an update to _Z
: https://github.com/rjnw/hakaru-benchmarks/commit/e6d9c20093d69d1cf3b52a3181d6021416a576bc

oh yes plot for llvm.

To be clear, to calculate theta and phi, you should feed the current z to _ThetaPhi
. Then to sweep through z, the new theta and phi would be fed to _Z
. Note that _Z
is so simple that you might need to adjust unsample
to handle the lack of weight
, or just not use unsample
.

Okay, let me try.

../../hkbin/hk-maple -p sexpression --timelimit=600 ../hksrc/LdaGibbs_ThetaPhi.hk > LdaGibbs_ThetaPhi.hkr
hk-maple: primCoerceFrom@Literal: negative HInt -1
CallStack (from HasCallStack):
error, called at haskell/Language/Hakaru/Syntax/AST.hs:178:24 in hakaru-0.6.0-AbnzUW5EnqjBuQmLn3LJbU:Language.Hakaru.Syntax.AST
make: *** [Makefile:4: LdaGibbs_ThetaPhi.hkr] Error 1

Would you please call hk-maple
with --debug
please?


shared in group as a snippet


Oh yeah I forgot, doing it now

Sent to Maple:
use Hakaru, NewSLO in timelimit(600, RoundTrip(lam(`topic_prior`, HArray(HReal(Bound(`>=`,0))), lam(`word_prior`, HArray(HReal(Bound(`>=`,0))), lam(`numDocs`, HInt(Bound(`>=`,0)), lam(`w`, HArray(HInt(Bound(`>=`,0))), lam(`doc`, HArray(HInt(Bound(`>=`,0))), lam(`z`, HArray(HInt(Bound(`>=`,0))), Msum(Weight((Product(Product(Product((`j` + idx(`word_prior`, `iB`)), `j`=0..(Sum(case(And((`iB` = idx(`w`, `dL`)), (`d` = idx(`z`, `dL`))), Branches(Branch(PDatum(true, PInl(PDone)), 1), Branch(PDatum(false, PInr(PInl(PDone))), 0))), `dL`=0..(size(`w`))-1))-1), `iB`=0..(size(`word_prior`))-1), `d`=0..(size(`topic_prior`))-1) * Product(Product(Product((`j` + idx(`topic_prior`, `i12`)), `j`=0..(Sum(case(And((`d` = idx(`doc`, `dL`)), (`i12` = idx(`z`, `dL`))), Branches(Branch(PDatum(true, PInl(PDone)), 1), Branch(PDatum(false, PInr(PInl(PDone))), 0))), `dL`=0..(size(`w`))-1))-1), `i12`=0..(size(`topic_prior`))-1), `d`=0..(`numDocs`)-1) * 1/(Product(Product((`i12` + Sum(idx(`topic_prior`, `dL`), `dL`=0..(size(`topic_prior`))-1)), `i12`=0..(Sum(case((`d` = idx(`doc`, `dL`)), Branches(Branch(PDatum(true, PInl(PDone)), 1), Branch(PDatum(false, PInr(PInl(PDone))), 0))), `dL`=0..(size(`w`))-1))-1), `d`=0..(`numDocs`)-1)) * 1/(Product(Product((`iB` + Sum(idx(`word_prior`, `dL`), `dL`=0..(size(`word_prior`))-1)), `iB`=0..(Sum(case((`d` = idx(`z`, `dL`)), Branches(Branch(PDatum(true, PInl(PDone)), 1), Branch(PDatum(false, PInr(PInl(PDone))), 0))), `dL`=0..(size(`w`))-1))-1), `d`=0..(size(`topic_prior`))-1))), hk-maple: primCoerceFrom@Literal: negative HInt -1
CallStack (from HasCallStack):
error, called at haskell/Language/Hakaru/Syntax/AST.hs:178:24 in hakaru-0.6.0-AbnzUW5EnqjBuQmLn3LJbU:Language.Hakaru.Syntax.AST
without value

augur has 5 sweeps, hakaru barely one full sweep, data points at every 10000 update. words-size: 353160, num-docs: 3431, num-words: 6907, num-topics: 50

I can run nips too, but it’s size is similar to 20newsgroup

correction: hakaru almost one full sweep last snapshot is at 350,000th update

Wow, cool. So you switched to a smaller data set and maybe integrating out theta and phi makes sense for this dataset because AugurV2 may be surpassed in log-likelihood by Hakaru LLVM in 7 minutes. Do I understand correctly? (I’m still debugging the “negative HInt –1” problem you encountered, and making progress.)

Yes, also if we compare accuracy per sweep then Hakaru is a lot better from the first sweep.

How long should I run this to see the lines intersect?

If you’ve run Hakaru LLVM for 400–500 seconds then the same number of seconds seems a good duration for AugurV2 :slightly_smiling_face:


Wait what? Is the x axis seconds or sweeps or…?

seconds

the snapshots are taking differently though, hakaru every 10,000 updates plus every sweep and augur every sweep

So, I just pushed a change to the typechecker (blush). Please pull and try on _ThetaPhi and _Z?

I will take a look at it tomorrow, what do you think of the above graph?

Oh this is 50-kos?

I mean this is the kos–50 dataset?

yes

Ah that makes sense. I was confused by our winning :stuck_out_tongue:

Winning or not, I think it’s definitely informative and should be included and explained in the paper.

well in this we were able to do a couple sweeps

I would say around 3 or 4

@ccshan LdaGibbs_ThetaPhi’s compilation works now. The only issue is there is stuff in there which I didn’t implement in llvm yet. :expressionless:

Really? Like what operation? (Note that weights don’t matter here)

plate

Oh!

but it’s implementation shouldn’t be much effort most of the stuff is already there. like for

Well, I hope you can overwrite a pre-allocated block of memory

That’s something I will figure out. We already do that with most of the memory. I don’t think we do allocation after compilation in any of the benchmarks we have right now.

What is summarize
in here https://github.com/rjnw/hakaru-benchmarks/blob/master/testcode/hssrc/LdaGibbs_ThetaPhi.hs ? I don’t see it in LogFloatPrelude

That shouldn’t be there… summarize
is supposed to be executed by hk-maple
; it means the histogram optimization
@ccshan commented on @rjnw’s file <https://racket.slack.com/files/U6602H150/FBG2XMUEL/ldagibbs_thetaphi_—debug.m|LdaGibbs_ThetaPhi —debug>: It looks like you haven’t removed
value
fromNewSLO/Interface.mpl
yet, but I don’t think that’s causing the problem.