We present a system that bridges the perceptual user interfaces paradigm and web applications, and thus allows us to control a web application through hand-gestures. It exemplifies a general interaction architecture that enables multi-modal interaction for arbitrary, unchanged web applications and thus makes available a large number of real-world applications for multi-modal interaction. In addition, we demonstrate how knowledge about the user interface provides a powerful constraint for pattern analysis. First evaluation results for the approach are given with respect to an image viewing web application.