
hakaru has 5 sweeps, augur ~190

and that’s just snapshotting per sweep for hk, right?

no every hundred update plus every sweep

just surprised that we have such a flat line

our accuracy is 0.82xx (identical to second decimal). '((41.884949999999996 0.822325)
(83.79035 0.8235250000000001)
(125.68235 0.8230000000000001)
(167.59390000000002 0.82305)
(209.45149999999998 0.82255))
time and accuracy

I will verify if we are measuring the accuracy correctly. I did some changes this time to compare with augur

Hmm, weird that AugurV2 is not as accurate as JAGS?

@rjnw if you have the AugurV2 generated code handy, would you please share it with me so I can take a second look?

NaiveBayes? The one in benchmark repository is the one I am using for evaluation. https://github.com/rjnw/hakaru-benchmarks/blob/04b8c466581f9d776efed2ef77a076e68a5338a3/runners/augur/nb.py

I mean the generated C code for NB

Oh, yeah.

I am also in office if you want to discuss in person.

Thanks and would you please double check that the sample logging and accuracy computation handles the z1/z2 split correctly?

I’m coming in soon

Yeah I am looking into it.

Sorry, that was old c code. I changed the working directory recently. Here is correct one.

deleting the older post.

Ah ok

The accuracy we calculate for augur is accurate. It is around 0.49, I verified in python itself where we split z in z1 and z2.

@carette Were you able to replicate the maple connection issue in the docker image I sent some time ago?

@rjnw I have not had a chance to try yet. Working on a paper with a Friday deadline.

I guess this is also becoming urgent?

@carette well, the POPL deadline is on the 11th

so yes, fixing this soon is necessary for a fix to be useful

Also it would be nice to have some correct numbers for simplification time.

Ok, I’ll try.

@rjnw In Dan’s reply an hour ago, I feel he asked about several issues that are unlikely to explain the big accuracy gap. But before I reply, can you confirm that you’re holding out 10% in AugurV2 (rather than 1%) and still see accuracy of 48–49%?
And you should set JAGS n.adapt to 0, and re-measure JAGS accuracy and startup times if necessary. This doesn’t explain the big accuracy gap that remains after a couple hundred AugurV2 sweeps (which is what you ran with 10% hold-out, right?). Initialization differences might.

We checked 10% in your office, when the total number of documents was 2000

I can remeasure jags with n.adapt to 0

Oh! But the ~80% accuracy was achieved by JAGS and Hakaru using the full 20-newsgroups data set, with 19997 documents. It’s quite possible that using the full 20-newsgroups data set and holding out 10% of its topics, like with JAGS and Hakaru, would let AugurV2 achieve ~80% accuracy!
Yes, please remeasure jags with n.adapt=0

What do you mean by using full 20-newsgroup data set? In augur’s model there are two document arrays. z1
is supervised and has 19997–2000 documents and z2
is sampled for 2000.

Ok, that sounds correct. I misunderstood when you said “the total number of documents was 2000”.

Oh sorry I meant “total number of heldout documents was 2000”

Jags just finished running, here is what I have:

sweep time accuracy
1 21546s 0.7798
2 22055s 0.8109

This is similar to what we had before but now also with adapt=0