Show All Reviews Monitoring Distributed Data Streams Through Node Clustering PaperID: 1161 Authors: Maria Barouti, Daniel Keren, Jacob Kogan, and Yaakov Malinovsky Keywords: Applications of Clustering, Mining text documents, Text Mining Reviewer 1's opinion Ranking Criteria Name Rank Appropriateness to the Conference: Weak Accept Originality: Marginal Technical Strength: Weak Accept Presentation: Weak Reject Overall Evaluation: Marginal Comments: In this paper, the authors present a strategy to decrease the communication overhead while monitoring data streams in distributed systems. In particular, they propose to apply a clustering algorithm to nodes in order to introduce an intermediate control step, before comunicating the violation of a threshold value to the central node. Contrary to classical clustering algorithms, the proposed algorithm has an objective function which aims to maximize the dissimilarity in the same cluster. Although the main idea seems to be promising, the paper has some issues: - All the sections appear full of equations, also referring to results obtained in other works. On the other hand, there is a lack of intuitive explanation of some presented concepts. For this reason, the whole paper appears difficult to be followed. - Section 2. The application in Text Mining appears not well connected to the rest of paper. Moreover, the equation (4), which should represent a measure of the information gain of a feature, is not clear. In particular, it is not clear how the feature is represented in the formula, since the sum loops on the values x11, x12, x21 and x22 which are independent on the given feature. - Section 2. The meaning of the relevance label r is not clear. - Section 3. The proposed node clustering approach appears motivated by the results reported in Table 1. However, the authors only state "The results immediately suggest to cluster nodes to further reduce communication load", without giving any explanation of the (possible) intuitions coming from such results. - Section 4. In the Equation (8), the sum should loop on the values {1,...,k} instead of on the values {1,...,n}. Moreover, the reported algorithm to compute the history vectors hn(tj) is not clear. What the authors mean for "for t increasing from tj to tj+1"? (it seems to be a single iteration) Moreover, in the same algorithm, a factor of 1/2 appears without any comment about its effect. Why do the author set such factor to 1/2 and not to another value? - Section 4. The proposed clustering algorithm seems to work in a bottom-up agglomerative fashion. However, the large set of formula let such simple approach to appear much more complicated. The description should be highly simplified. - Section 5. The authors apply their clustering approach in a scenario with only 10 nodes. I think that the advantages of a clustering approach should be evaluated in a scenario with a much higher number of nodes. Moreover, the authors do not report the number of identified clusters in the performed experiments. Reviewer 2's opinion Ranking Criteria Name Rank Appropriateness to the Conference: Strong Accept Originality: Weak Accept Technical Strength: Weak Accept Presentation: Strong Accept Overall Evaluation: Weak Accept Comments: The paper presents an alternative approach to monitoring data streams in a distributed system. This approach combines system theory techniques and clustering. The difference with respect to other clustering algorithms that looks for similar data is that monitoring requires clusters with dissimilar vectors able to cancel each other as much as possible. I am not an expert on this topic but the paper seems correct, it is clearly written and the explanation of methods appear to be clear and complete enough. My doubts are due to the fact that the number of experiments and the comparison with state of the art seem quite limited. Reviewer 3's opinion Ranking Criteria Name Rank Appropriateness to the Conference: Weak Accept Originality: Weak Accept Technical Strength: Weak Accept Presentation: Weak Accept Overall Evaluation: Weak Accept Comments: The paper presents a new technique based on systems theory and clustering for distributed data stream monitoring. The paper is novel and technically correct. Simulation results could be strengthened by better presentation of main benefits of the new approach. A better validation of a new method by comparing it with a larger number of competing approaches would improve the paper significantly. Powered by IAPRCommence