Zhang Jiangwei

School of Computing, National University of Singapore

Jiangwei is a Ph.D student in the School of Computing (SOC) at National University of Singapore(NUS). I am currently under the supervision of Prof. Y.C. Tay since 2014. My research focus is to create benchmarks from empirical data sets, with a particulat focus on social network data.

I was in the double degree problem (Applied Mathematics and Computer Science) and graduated with first class hornos for both degrees (2009-2013). I did my final year project with Prof. Frank Stephan on Automata theories and did my Undergraduate Research Opportunities Programe(UROP) with Prof. Lee Wee Sun on gaming with an AI bot.





Publications


Graph Scaling
Local Features
Synthetic generation

Paper

Dscaler: Synthetically Scaling A Given Relational Database

The Dataset Scaling Problem (DSP) defined in previous work states: Given an empirical set of relational tables D and a scale factor s, generate a database state D' that is similar to D but s times its size. A DSP solution is useful for application development (s < 1), scalability testing (s > 1) and anonymization (s = 1). Current solutions assume all table sizes scale by the same ratio s.

However, a real database tends to have tables that grow at different rates. This paper therefore considers non-uniform scaling (nuDSP), a DSP generalization where, instead of a single scale factor s, tables can scale by different factors.

Dscaler is the first solution for nuDSP. It follows previous work in achieving similarity by reproducing correlation among the primary and foreign keys. However, it introduces the concept of a correlation database that captures fine-grained, per-tuple correlation.

Experiments with well-known real and synthetic datasets D show that Dscaler produces D' with greater similarity to D than state-of-the-art techniques. Here, similarity is measured by number of tuples, frequency distribution of foreign key references, and multi-join aggregate queries.


J.W. Zhang and Y.C. Tay. DSCALER: Synthetically Scaling A Given Relational Database. PVLDB 9, 14 (Sept. 2016), 1671--1682.

Graph Scaling
Local&Global Features
Social Networks

Paper Source

Gscaler: Synthetically Scaling A Given Graph

Enterprises and researchers often have datasets that can be represented as graphs (e.g. social networks). The owner of a large graph may want to scale it down to a smaller version, e.g. for application development. On the other hand, the owner of a small graph may want to scale it up to a larger version, e.g. to test system scalability. This paper investigates the Graph Scaling Problem (GSP):

Given a directed graph G and positive integers n' and m' , generate a similar directed graph G' with n' nodes and m' edges.

This paper presents a graph scaling algorithm Gscaler for GSP. Analogous to DNA shotgun sequencing, Gscaler, decomposes G into small pieces, scales them, then uses the scaled pieces to construct G'. This construction is based on the indegree/outdegree correlation of nodes and edges. Extensive tests with real graphs show that Gscaler is scalable and, for many graph properties, it generates a G' that has greater similarity to G than other state-of-the-art solutions, like Stochastic Kronecker Graph and UpSizeR.


J.W. Zhang and Y.C. Tay. GSCALER: Synthetically Scaling A Given Graph. Proc. 19th Int. Conf. Extending Database Technology (EDBT), Bordeaux, France (March 2016), 53--64.





Graph Scaling
Graph measurement
Online Computation

Demo Paper

GscalerCloud: A Web-Based Graph Scaling Service

Enterprises and researchers often have datasets that can be represented as graphs (e.g. social networks). The owner of a large graph may want to scale it down to a similar but smaller version, e.g. for application development. On the other hand, the owner of a small graph may want to scale it up to a similar but larger version, e.g. to test system scalability. Gscaler is a recently developed tool for such scaling.

This demonstration presents GscalerCloud, a web-based service for access to Gscaler. A user can specify, via a browser, an example or empirical graph G, and a target size for the scaled version G'. The demonstration has two stages. In Stage I, the visitor can experiment with small graphs and check the similarity between input G and scaled G'. In Stage II, the visitor can test Gscaler with large graphs, and check for similarity using aggregate metrics and statistical distributions generated by GscalerCloud’s tools.

J.W. Zhang, A. Mal and Y.C. Tay. GscalerCloud: A Web-Based Graph Scaling Service ( Demo Website). Proc. 33rd IEEE Int. Conf. Data Engineering (ICDE), San Diego, USA (Apr. 2017),1391--1392.


Traffic Application
Cloud Platform
Map-Reduce

Paper

Smart Traffic Cloud: An Infrastructure for Traffic Applications

With rapid development of sensor technologies and wireless network infrastructure, research and development of traffic related applications, such as real time traffic map and on-demand travel route recommendation have attracted much more attentions than ever before. Both archived and real-time data involved in these applications could potentially be very big, depending on the number of deployed sensors. Emerging Cloud infrastructure can elastically handle such big data and conveniently providing nearly unlimited computing and storage resources to hosted applications, to carry out analysis not only for long-term planning and decision making, but also analytics for near real-time decision support. In this paper, we propose Smart Traffic Cloud, a software infrastructure to enable traffic data acquisition, and manage, analyze and present the results in a flexible, scalable and secure manner using a Cloud platform. The proposed infrastructure handles distributed and parallel data management and analysis using ontology database and the popular Map-Reduce framework. We have prototyped the infrastructure in a commercial Cloud platform and we developed a real-time traffic condition map using data collected from commuters’ mobile phones.
Wenqiang Wang, Xiaoming Zhang, J.W. Zhang, Hock Beng Lim: Smart Traffic Cloud: An Infrastructure for Traffic Applications. ICPADS 2012: 822-827


Automata Theory
Regular Language
Context Free Language

Report

The XX Problem

Many-one reductions are often used to compare the complexity of sets. To prove A is at most as complicated as B, one would reduce a set A to a set B via a reduction function f, that is the fol- lowing condition holds: ∀x such that, x ∈ A ⇐⇒ f(x) ∈ B. Generally, this approach requires the function f itself is quite easy to determine. In this project, the reverse question is investigated. Given a natural function f, for each set A in the class, is there another set B in the same class such that ∀x,x∈A ⇐⇒ f(x)∈B?

This study is motivated from the following problem where f(x) is the concatenation of x with it- self,thatisoftheformxx. IfAisregular,istherearegularsetBsuchthat∀x, x∈A ⇐⇒ xx∈B? The project wants to study this question not only for this concrete mapping but also for some other natural mapping. Moreover, the strategies to find the set B for a given set A is investigated as well. Furthermore, the corresponding question should also be investigated for the class of context-free languages and other natural classes from the theory of formal languages.

This problem is still an open problem!!!


Contact Me

Please feel free to contact me.

13 Computing Dr
Singapore
117417

Email: a0054808@u.nus.edu