Show All Reviews
Monitoring Distributed Data Streams Through Node Clustering
PaperID: 1161
 
Authors: Maria Barouti, Daniel Keren, Jacob Kogan, and Yaakov Malinovsky
 
Keywords: Applications of Clustering, Mining text documents, Text Mining
Reviewer 1's opinion
 
Ranking Criteria
Name 	Rank
Appropriateness to the Conference: 	Weak Accept
Originality: 	Marginal
Technical Strength: 	Weak Accept
Presentation: 	Weak Reject
Overall Evaluation: 	Marginal
 

Comments:
In this paper, the authors present a strategy to decrease the communication overhead while monitoring data streams in distributed systems.
In particular, they propose to apply a clustering algorithm to nodes in order to introduce an intermediate control step, before comunicating
the violation of a threshold value to the central node.
Contrary to classical clustering algorithms, the proposed algorithm has an objective function 
which aims to maximize the dissimilarity in the same cluster.

Although the main idea seems to be promising, the paper has some issues:
- All the sections appear full of equations, also referring to results obtained in other works. On the other hand, there is a lack of intuitive explanation of
some presented concepts. For this reason, the whole paper appears difficult to be followed.

- Section 2. The application in Text Mining appears not well connected to the rest of paper. Moreover, the equation (4), which should represent a measure
of the information gain of a feature, is not clear. In particular, it is not clear how the feature is represented in the formula, since the sum loops on
the values x11, x12, x21 and x22 which are independent on the given feature.

- Section 2. The meaning of the relevance label r is not clear.

- Section 3. The proposed node clustering approach appears motivated by the results reported in Table 1. However, the authors only state
"The results immediately suggest to cluster nodes to further reduce communication load", without giving any explanation of the (possible) intuitions  
coming from such results.

- Section 4. In the Equation (8), the sum should loop on the values {1,...,k} instead of on the values {1,...,n}. Moreover, the reported algorithm to
compute the history vectors hn(tj) is not clear. What the authors mean for "for t increasing from tj to tj+1"? (it seems to be a single iteration)
Moreover, in the same algorithm, a factor of 1/2 appears without any comment about its effect. Why do the author set such factor to 1/2 and not to another value?

- Section 4. The proposed clustering algorithm seems to work in a bottom-up agglomerative fashion. However, the large set of formula let such simple approach
to appear much more complicated. The description should be highly simplified.

- Section 5. The authors apply their clustering approach in a scenario with only 10 nodes. I think that the advantages of a clustering approach
should be evaluated in a scenario with a much higher number of nodes. Moreover, the authors do not report the number of identified clusters in the
performed experiments.
Reviewer 2's opinion
 
Ranking Criteria
Name 	Rank
Appropriateness to the Conference: 	Strong Accept
Originality: 	Weak Accept
Technical Strength: 	Weak Accept
Presentation: 	Strong Accept
Overall Evaluation: 	Weak Accept
 

Comments:
The paper presents an alternative approach to monitoring data streams in a distributed system. This approach combines system theory techniques and clustering.
The difference with respect to other clustering algorithms that looks for similar data is that monitoring requires clusters with dissimilar vectors able to cancel each other as much as possible.
I am not an expert on this topic but the paper seems correct, it is clearly written and the explanation of methods appear to be clear and complete enough.
My doubts are due to the fact that the number of experiments and the comparison with state of the art seem quite limited.
Reviewer 3's opinion
 
Ranking Criteria
Name 	Rank
Appropriateness to the Conference: 	Weak Accept
Originality: 	Weak Accept
Technical Strength: 	Weak Accept
Presentation: 	Weak Accept
Overall Evaluation: 	Weak Accept
 

Comments:
The paper presents a new technique based on systems theory and clustering for distributed data stream monitoring. The paper is novel and technically correct. Simulation results could be strengthened by better presentation of main benefits of the new approach. A better validation of a new method by comparing it with a larger number of competing approaches would improve the paper significantly.  

Powered by IAPRCommence