Excellent question! I started out reading a little of this and a little of that, mainly in works by digital humanists themselves. Which works fine for a long time.

For instance, the user instructions for Wordhoard at Northwestern are very helpful. I haven't used the tool itself, but browsing the instructions taught me a lot. E.g., what's "Dunning's log likelihood"?

http://wordhoard.northwestern.edu/userman/index.html

But eventually I realized that there was a whole different *discipline* out there that I was getting only indirectly, in bits and pieces. I think, sooner or later, we need to face that reality and go direct to the source. You're doing that by putting Blei on your list, so congrats.

On the other hand, this is kind of a crazy thing to ask a grad student to do. You're specializing in one part of Discipline A (history) -- you can't seriously add "(plus everything in Discipline B)" to your fields list!

But what can I say? This is a crazy enterprise.

So, if you're looking for an overview of data science / machine learning, I would honestly just pick up a textbook. Let's face it: this is a whole different discipline. Don't read the whole book, necessarily. Browse for parts that are relevant to what you want to do. On line, I often find myself consulting this one:

http://nlp.stanford.edu/IR-book/

It'll explain things like naive Bayes document classification.

But my current favorite source is _Data Mining_ by Witten, Frank, and Hall. It's pretty readable -- more readable than that Stanford text -- and it focuses on practical questions. A lot of it may be more ambitious than you can actually do at this stage, but it's important to know what's possible. E.g., different kinds of exploratory clustering that might help you recognize groups of similar documents.

I'm constantly running into mathematical notation I don't understand. I think Steve Ramsay is about to come out with a book on mathematical notation for humanists. When he does, we should all buy it. Until then, honestly, I have two strategies: a) start by ignoring the math and just try to understand the text and/or pseudocode and then b) if it turns out I really need to understand the math, Wikipedia!

Godspeed. This stuff is not easy, but I think it's a thrilling challenge.