-
Part 1
For each text build a vector of numerical entries as follows:
-
We assume that the dictionary for the entire collection contains d terms.
-
We assume that the total number of documents in the collection is n.
-
The entry i=1…d in document j=1…n is computed as
a product of 3 terms LijGiNj
where
-
Part 2
You will see that the vectors in Part 1 are very sparse, that is a vast
majority of entries are 0. To save memory use
Harwell-Boeing sparse matrix format.
-
Part 3
Report size of both matrices.