Program

July, 9, Morning
Welcome speeches from USTC official
8:30~9:00

MakeMultipoint and Multimedia Communication over IP Network a Reality
Dr. Weiping Li

9:00~9:55

Coding with side information: Who cares?
Dr. Zixiang Xiong

10:00~10:55

Media 2.0 ¨C the New Media Revolution?
Dr. Shipeng Li

11:00~11:55
July, 9, Afternoon

Mobile Multimedia and Handheld Digital TV: Is It for Real?
Prof. Changwen Chen

14:00~14:55

Multimedia Security: Opportunities and Challenges
Dr. Qibin Sun

15:00~15:55

Image & Video Compression beyond Signal Processing
Dr. Feng Wu

16:00~16:55
Panel I
17:00~17:40
July, 10, Morning

Search-Based Web Image Annotation
Dr. Mingjing Li

8:30~9:25

Digital Effects for Visual Media
Prof. Xiaoou Tang

9:30~10:25

Mining Content and Context for Semantic Multimedia
Dr. Jiebo Luo

10:30~11:25
July, 10, Afternoon
Several Key Issues in Similarity Search from Large Image Databases
Dr. Qi Tian
14:00~14:55

Bayesian Tensor Approach for 3D Face Modelling
Dr. Dacheng Tao

15:00~15:55

Photo2Search: a large scale mobile image search system
Dr. Xing Xie

16:00~16:55
Panel II
17:00~17:40

*There is a 5 min's coffee break between every two talks.
Note: The working language of workshop is English.

 

Presentation Overview


Dr. Weiping Li

Make Multipoint and Multimedia Communication over IP Network a Reality

Multipoint and multimedia communication has been a challenging technical problem for a long time. Before the dramatic growth of the Internet and the ever increase of the computing power, this problem was mainly an academic topic. Just think about the difficulty of making a conference call and so many failed attempts for video phone, we can appreciate the tough nature of bringing multipoint and multimedia communication into the practical world. Now, the fundamental technology in communication and computing has reached a level that allows meaningful development of multipoint and multimedia communication systems for real world applications. However, this does not mean that there are no more challenges in such development. This presentation addresses some of the major such challenges and discusses some of the solutions. It concludes with thoughts on some interesting research problems that come from the practical applications.


Prof. Chang Wen Chen

Mobile Multimedia and Handheld Digital TV: Is It for Real?

This talk will first review recent technology trends in mobile multimedia and digital TV, especially the changing landscape and the paradigm shift revolution in digital video that may impact worldwide consumers at home and on the road. Then, the talk will examine how the challenging characteristics of mobile digital video will mean for technology advancement and the potential implications for emerging applications in our contemporary mobile life styles. As a prime example of mobile multimedia applications, mobile IPTV (Internet Protocol TV) will be examined in more detail. In particular, DVB-H as a standard for mobile IPTV will be analyzed and major enhancement components of DVB-H over DVB-T will be discussed. This European originated standard has made its way to both Asia and North American and is expected to have a significant influence in consumer electronics industry in US. Technical challenges and research opportunities for IPTV, mobile IPTV, and DVB-H will then be identified. Both technical challenges and research opportunities for emerging mobile multimedia applications will be identified.

Prof. Zixiang Xiong
Coding with side information: Who cares?

We live in a networked world, where side information is ubiquitous. As electrical engineers and information scientists, we care about side information so that we can take advantage of it in improving network communications. In this talk, we will review the information-theoretic foundation of coding with side information, examine recent developments in limit-approaching code designs, and highlight applications of coding with side information, especially those relating to distributed video coding, image data hiding, and MIMO/cooperative multimedia communications.


Dr. Shipeng Li
Media 2.0 ¨C the New Media Revolution?

With the rapid development of Web 2.0 concept and applications, there are many unprecedented web-based multimedia applications are emerging today and they pose many new challenges in multimedia research. In this talk, I first summarize the common features of the new wave of multimedia applications which I call Media 2.0. I use 5 D¡¯s to describe Media 2.0 principles, namely, Democratized media life cycle; Data-driven media value chain; Decoupled media system; Decomposed media contents; and Decentralized media business model. Then I explain what the implications of Media 2.0 to multimedia research are and how we should choose our research topics that could make big impacts. Finally, I use example research projects ranging from media codecs, media systems, media search and media related advertisement from MSRA to demonstrate the ideas I have talked about. I hope these ideas and principles could inspire the audience to come up with new media 2.0 research topics and applications in the future.


Dr. Jiebo Luo
Mining Content and Context for Semantic Multimedia

Lower cost devices and growing communication infrastructure have led to an explosion in the creation, archival, distribution and consumption of multimedia (images, videos, music, and text). Much of the recent research has been focused on semantic multimedia understanding. However, limited by the state of the art, many existing systems have taken a low-level approach and fallen short of higher level interpretation. Current research has started to emphasize ways for bridging the ¡°semantic gap¡± between human and computer by integrating content recognition, human perception, physical models, context models, and multi-modal sensor fusion.

In particular, context is critical in the human recognition process where the humans make extensive use of the environmental knowledge to facilitate object and scene recognition (e.g., where are the pedestrians? most likely along sidewalks). Likewise, context can be used to improve the performance of automated systems. In this talk, we present a unified perspective on exploiting a broad array of context information in order to improve semantic scene content understanding. These include spatial context (relationships between regions in the same scene), temporal context (elapsed time between pictures), imaging context (camera sensor metadata about scene properties, such as exposure time and subject distance), as well as geo and social context. For the first time, tremendous amount of contextual information is being recorded by various devices we use and ready to facilitate semantics-driven multimedia applications.


Dr. Mingjing Li
Search-Based Web Image Annotation
Keyword-based search is the most natural way to search for images on the web. However, it is still a challenge to automatically index images with their semantic descriptions because image understanding is an unsolved problem. To overcome the difficulties, commercial image search engines index web images using the textual information in the hosting web pages, assuming that such information is somewhat relevant to the semantic content of embedded images. Although this approach is quite useful, the indexing is not accurate enough because such textual information is very noisy. In this talk, I will present two algorithms for automatic web image annotation. One is a graph-learning based keyword propagation method, the other is a bipartite graph reinforcement model. Both methods utilize the rich information on the web via search to help web image annotation.

Prof. Xiaoou Tang
Digital Effects for Visual Media
In general, it is not difficult to build a demo in computer vision research. However, it is very difficult to build a vision system that works in real life. As a result, the impact of computer vision on our daily life is fairly limited compared to other research areas, such as computer graphics, multimedia communication, and networking. This is a great challenge to all vision researchers. In this talk, I will discuss some vision research topics we are working on at Microsoft Research Asia. In particular, I would like to demonstrate how we strive to develop computer vision technology to make impact on our daily life. Especially, I will show demos on the following subjects: how to use face detection and recognition technology to annotate and organize photo albums; how to add digital effect to image through image editing; how to add digital effect to a live video stream during a MSN online video chat session.

Dr. Feng Wu
Image & Video Compression beyond Signal Processing
This talk first gives an overview of the researches on visual signal processing from the viewpoints of signal processing and computer vision, respectively. This talk also gives several brief analyses and comparisons on these two categories of researches about their differences and commonalities. Since 1980¡¯s early, some attempts have been reported to incorporate these two categories of technologies together for image and video compression. But in the past two decades, the mainstream coding schemes mostly adopts the same signal-processing-based framework from prediction, transform to entropy coding. In this talk, I will introduce several pioneer researches on image and video compression in Microsoft Research Asia, including image inpainting, new image representation and directional transform.

Dr. Qibin Sun

Multimedia Security: Opportunities and Challenges

In this talk, I will give an overview on Multimedia security such as digital rights management, encryption and access control, authentication and forensics, and digital watermarking etc. In particular, I will focus on their related applications, technical challenges and current international standardization status.


Dr. Qi Tian

Several Key Issues in Similarity Search from Large Image Databases

Multimedia information retrieval (MIR) has been a diverse field after decades of active research. As the staring point of research on content-based multimedia retrieval, content-based image retrieval (CBIR) is still and will remain an important problem, and large image archives present very valuable applications with high impact in such as biometrics, medical archives, remote sensing archives, and biological applications.

This talk is concerned with three open issues in similarity search from large image databases. Among which, three difficulties are (i) appropriate distance metric for similarity estimation, (ii) learning of high dimensional data with small sample set, and (iii) semantic gap between low-level visual features and high-level semantic concepts.

To address these open issues, we proposed (i) a dynamic framework for distance learning and feature selection; (ii) an adaptive subspace learning framework for robust and accurate modeling of image data with small samples; and (iii) a semantic manifold learning framework for bridging the semantic gap.   Our proposed methods are tested on large image datasets and showed superior results compared to other state-of-the-art methods. Though we tested the proposed algorithms on image databases, it should be noted the proposed approaches are generic and can be applied to many other similarity search applications such as biometrics-based person recognition, and non-computer vision application such as gene-expression based microarray analysis.

Dr. Dacheng Tao
Bayesian Tensor Approach for 3D Face Modeling
It is important to find an effective way to model a collection of three dimensional faces, which is helpful for various applications, especially expression driven ones, e.g., expression generation, expression retargeting, and expression synthesis. A collection of three dimensional faces, in point cloud format, with different identities and expressions forms naturally a collection of second order tensors. These second order tensors have only one modality for identity representation and another modality for expression representation. Moreover, the number of the second order tensors is equivalent to three times of the number of vertices for three dimensional face modelling.

Bayesian data modelling provides a natural way for data analysis and it has been successfully applied widely, from computer vision to machine learning. However, it works only for vector data. Therefore, there is a gap between the data representation and the tool for data analysis. Aiming at bridging the gap between vectors and tensors in conventional statistical tasks, e.g., automatic model selection for modelling a collection of three dimensional faces here, this paper proposes a decoupled probabilistic algorithm, named Bayesian tensor analysis (BTA). BTA automatically selects a suitable model for tensor data, as demonstrated by empirical studies. Based on BTA, original big sized second order tensors are compressed into small sized second order tensors. Empirical studies justify its advantages.

Dr. Xing Xie
Photo2Search: a large scale mobile image search system

Current mobile search engines are mostly mobile versions of desktop-based counterparts and use text-based query input. However, mobile information needs are not always suitable for keyword descriptors. Instead of current flat query modes, camera phones can support much richer queries, using images as well as text. Therefore, we believe it is important to develop a mobile search service that allows users to search for relevant information on the Web via pictures taken on a mobile phone. In this presentation, we will show our progress on developing a large-scale mobile image search system capable of matching an image query against millions of photos in a database.