Part 1 For each text build a vector of numerical entries as follows:
1. We assume that the dictionary for the entire collection contains d terms.
2. We assume that the total number of documents in the collection is n.
3. The entry i=1…d in document j=1…n is computed as a product of 3 terms L_ijG_iN_j where
  - L_ij is the number of times term i appears in document j,
  - G_i=n/n_i, where n_i is the number of documents containing term i,
  - N_j is the normalizing factor, i.e. N_j(∑_iL_ijG_i)=1
    (for details see New term weighting formulas for the vector space method in information retrieval, Chisholm and Kolda.
    Report ORNL/TM-13756, Computer Science and Mathematics Division, Oak Ridge National Laboratory)
Part 2 You will see that the vectors in Part 1 are very sparse, that is a vast majority of entries are 0. To save memory use Harwell-Boeing sparse matrix format.
Part 3 Report size of both matrices.