Yuqiang Guan (关玉强), Ph.D.

Senior software engineer at Google


 

Contact

 

Work Phone: 1 (310) 310-6105

Email: yguan@google.com (office), yuqiang_guan@yahoo.com (personal)

Curriculum Vitae: http://www.GuanYuqiang.net/guan_cv.pdf

 

Work Experience

Mar. 2008 to Now Senior software engineer at Google
Software engineer at Google
Area: Online ads relevance. Our team's responsibility is using machine learning techniques to build a mathematical model that understands concepts for any given text. The model is trained offline and served in real time. The concepts generated by this model are widely used in many AI and machine learning systems at Google. My work has resulted in several percent of increase in ads revenue and user click through rate.
June 2006 to Feb. 2008 Software engineer at Microsoft
Area: Web search quality. As a member of the data mining group in Live (now called Bing) search team, my work involves building an efficient search log pipeline for all Microsoft search traffic, detecting abnormality in its throughput and mining useful information from logs. I also did analysis on user retention for search campaigns and worked on instant answers for certain type of search queries such as questions.
May 2005 to Aug. 2005 Software intern at Syncata (a ProQuest company)
Area: Information extraction and text mining. We extract patterns from customer complaints and technician response to automatically formulate solutions for fixing cars.

 

Professional Activities

 

Referee for

Pattern Recognition

Computational Intelligence in Bioinformatics (book)

IEEE Transactions on Pattern Analysis and Machine Intelligence

IEEE Transactions on Neural Networks

SIAM Journal on Matrix Analysis and Applications

International Journal on Document Analysis and Recognition

Program committee member for

IEEE Data Mining Conference 2006

Reviewer for

IEEE Data Mining Conferences (ICDM 2003-05)

SIAM Data Mining Conferences (SDM 2002-06)

ACM KDD Conferences (KDD 2004-05)

Workshop on Clustering High-Dimensional Data and its Applications (at SDM 2002-04 & ICDM 2003)

Data Mining and Knowledge Discovery Journal

ACM Computing Surveys

IEEE Transactions on Knowledge and Data Engineering

IEEE Transactions on Pattern Analysis and Machine Intelligence

book chapters of a data mining text book


Education

 

Jan. 2000 - May 2006
 

Ph.D. in Computer Science, The University of Texas at Austin
Dissertation title: "Large-Scale Clustering: Algorithms and Applications"
Advisor: Inderjit Dhillon

Sept. 1997 - Dec. 1999 

Ph.D. student in Computer Science, Western Michigan University

Sept. 1992 - July 1997 

B.S. in Computer Science, University of Science and Technology of China

 

Research Interest


Large-scale data mining, machine learning, pattern recognition, information retrieval, bioinformatics, scientific computing and graph theory.
 

Developed a fast, high-quality multilevel kernel-based graph clustering algorithm.

Obtained new theoretical connections between spectral clustering and weighted kernel k-means.

Investigated the minimum residue co-clustering algorithms on gene-expression data in bioinformatics.

Proposed a time and memory efficient technique for entire preprocessing and clustering large document collections.

Proposed a local search strategy to improve clustering results.

Wrote Gmeans software and Graclus software. Co-author of Co-cluster software.

Studied profile minimization on triangulated triangles and integral computation using Quasi-Monte Carlo Methods.

 

Paper
 

Book Chapters

Clustering with Entropy-like k-means Algorithms

M. Teboulle, P. Berkhin, I. Dhillon, Y. Guan and J. Kogan

Book chapter in Grouping Multidimensional Data : Recent Advances in Clustering, J. Kogan, C. Nicholas, M. Teboulle(Eds.), pages 127-160, Springer, 2006.

Efficient Clustering of Very Large Document Collections

I. Dhillon, J. Fan and Y. Guan

Invited book chapter in Data Mining for Scientific and Engineering Applications, R. L. Grossman, C. Kamath, P. Kegelmeyer, V. Kumar, R. R. Namburu (Eds.), pages 357-381, Kluwer, 2001.

Conference paper

 

A Fast Kernel-based Multilevel Algorithm for Graph Clustering

Dhillon, Y. Guan and B. Kulis

Proceedings of The 11th ACM SIGKDD, Chicago, IL, Aug. 21 - 24, 2005.

Kernel k-means, Spectral Clustering and Normalized Cuts

I. Dhillon, Y. Guan and B. Kulis

Proceedings of The 10th ACM SIGKDD, Seattle, WA, August 22-25, 2004.

Minimum Sum-Squared Residue Co-clustering of Gene Expression Data

H. Cho, I. Dhillon, Y. Guan and S. Sra

Proceedings of The 4th SIAM Data Mining Conference, Lake Buena Vista, Florida, April 22-24, 2004.

Information Theoretic Clustering of Sparse Co-Occurrence Data

I. Dhillon and Y. Guan

Proceedings of The 3rd IEEE International Conference on Data Mining, Melbourne, Florida, November 19 - 22, 2003.

Iterative Clustering of High Dimensional Text Data Augmented by Local Search

I. Dhillon, Y. Guan and J. Kogan

Proceedings of The 2nd IEEE Data Mining Conference, Maebashi TERRSA, Maebashi City, Japan December 9 - 12, 2002.

Resource Allocation for Clusters

E. de Doncker, L. Cucos and Y. Guan

Proceedings of the High Performance Computing Symposium, pp. 122-125, 2001.

Distributed Quasi-Monte Carlo Methods in a Heterogeneous Environment

E. de Doncker, R. Zanny, M. Ciobanu and Y. Guan

Proceedings of the IPDPS Heterogeneous Computing Workshop 2000, pp. 200-206, 2000.

Asynchronous Quasi-Monte Carlo Methods

E. de Doncker, R. Zanny, M. Ciobanu and Y. Guan

Proceedings of the High Performance Computing Symposium, pp. 130-135, 2000.

On diameter D edge Deletion problems

Y. Guan and K. Williams

Presentation on Combinatorics, Graph Theory, Computing 30th Conference, Boca Raton FL, March 1998.

 

Journal paper

 

Weighted Graph Cuts without Eigenvectors: A Multilevel Approach

I. Dhillon, Y. Guan and B. Kulis

IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 29, No. 11, pp 1944-1957, 2007.

Error Bounds for the Integration of Singular Functions using Equidistributed Sequences

E. deDoncker and Y. Guan

Journal of Complexity. 19/3 pp. 259-271, Elsevier Science, April 2003.

Profile Minimization on Triangulated Triangles

Y. Guan and K. Williams

Discrete Mathematics, vol 260C, pp 69-76, Elsevier Science, Jan. 2003.

 

Technical Reports

 

A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts

I. Dhillon, Y. Guan and B. Kulis

UTCS Technical Report #TR-04-25, 2004.

Information Theoretic Clustering of Sparse Co-Occurrence Data

I. Dhillon and Y. Guan

UTCS Technical Report #TR-03-39, Sept., 2003. Also appeared in the 3rd SIAM International Conference on Data Mining (Workshop on Clustering High-Dimensional Data and its Applications), 2003.

Refining Clusters in High-dimensional Text Data

I. Dhillon, Y. Guan, and J. Kogan

UTCS Technical Report #TR-02-03, Jan, 2002. Also appeared in the 2nd SIAM International Conference on Data Mining (Workshop on Clustering High-Dimensional Data and its Applications), April, 2002.


Software
 

Graclus is a fast graph clustering software that computes normalized cut and ratio association for a given undirected graph without any eigenvector computation. This is possible because of the mathematical equivalence between general cut or association objectives (including normalized cut and ratio association) and the weighted kernel k-means objective. One important implication of this equivalence is that we can run a k-means type of iterative algorithm to minimize general cut or association objectives. Therefore unlike spectral methods, our algorithm totally avoids time-consuming eigenvector computation. We have embedded the weighted kernel k-means algorithm in a multilevel framework to develop very fast software for graph clustering.

Co-cluster is a C++ program which implements three co-clustering algorithms: information-theoretic co-clustering algorithm and two types of minimum sum-squared residue co-clustering algorithms. In our implementation, all the algorithms have the ping-pong structure, i.e., a batch algorithm followed by corresponding chain of first variations. Each algorithm also has five variations, based on in what order to update the row or column centroids.

It contains spherical k-means, information-theoretic clustering, diametric clustering and Euclidean k-means algorithms with 6 different initializations and local search.

  • Hierarchical spherical k-means (in C++)

A tool for efficiently and hierarchically clustering large-scale document collections.

 

Teaching Experience

I have been a TA for various courses, graduate and undergraduate. My TA duties involve grading, holding office hours, leading discussion sessions, working out solutions for homework and exams, helping professors edit lecture notes, maintaining class web pages, etc.

Scientific Computing (Spring and Fall 2004)

System Modeling & Scientific Computing (Spring 2003)

Advanced Programming (Fall 2002)

Elements of Software Design (Spring 2002)

Large Scale Data Mining (Spring and Fall 2001)

Abstract Data Type (Spring and Fall 2000)

 

Honor and Awards
 

Excellent in Research Award recipient (2000)

College-wide "Outstanding Graduate Research Award" recipient (1999)

Invited to join UPE (Upsilon Pi Epsilon) Honor Society (1999)

Invited to join ICA (the Institute of Combinatorics and its Applications) (1999)

 


Computer Skills
 

Languages: C/C++/C#, Perl, SQL, Java, MPI, HTML

OS: Unix/Linux, Windows

Tools: MapReduce, Perforce, BigTable, Latex, Matlab, Windows Office, Google Apps

 


References

Upon request.


Citizenship and Visa status

China and EAD.