The goal of the Kinetics dataset is to help the computer vision and machine learning communities advance models for video understanding. Given this large human action classification dataset, it may be possible to learn powerful video representations that transfer to different video tasks.
The Kinetics-700-2020 dataset will be used for this challenge. Kinetics-700-2020 is a large-scale, high-quality dataset of YouTube video URLs which include a diverse range of human focused actions. The aim of the Kinetics dataset is to help the machine learning community create more advanced models for video understanding. It is an approximate super-set of both Kinetics-400, released in 2017, Kinetics-600, released in 2018 and Kinetics-700, released in 2019.
The dataset consists of approximately 650,000 video clips, and covers 700 human action classes with at least 700 video clips for each action class. Each clip lasts around 10 seconds and is labeled with a single class. All of the clips have been through multiple rounds of human annotation, and each is taken from a unique YouTube video. The actions cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands and hugging.
More information about how to download the Kinetics dataset is available here.
At the heart of an Indian woman’s life is the concept of Sanskara —the values and ethics passed down through generations. While the traditional "joint family" system is evolving into nuclear setups in urban centers like Mumbai and Bangalore, the emotional tether to the extended family remains unbreakable.
Interestingly, there is a massive "return to roots" movement. Ancient superfoods like millets, turmeric, and moringa—staples in grandmothers' kitchens for centuries—are being rebranded as modern wellness essentials. Yoga, once a spiritual practice, is now a daily fitness pillar for the urban Indian woman seeking balance in a chaotic world. The Digital Shift and Self-Expression red saree aunty boobs captured target
However, the "Indo-Western" trend dominates daily lifestyle. A college student might pair a traditional Kurti with ripped jeans, or a corporate executive might wear a sleek blazer over a formal tunic. This blending of styles isn't just about fashion; it’s a visual representation of her dual identity: rooted in India, yet a citizen of the world. The Professional Revolution At the heart of an Indian woman’s life
The landscape of Indian womanhood today is a breathtaking study in contrasts. It is a world where high-tech professionals navigate glass-ceiling boardrooms in the morning and return home to light traditional oil lamps in the evening. To understand the lifestyle and culture of Indian women is to understand a continuous dialogue between five thousand years of heritage and a fast-paced, digital future. The Foundation: Family and Social Fabric A college student might pair a traditional Kurti
The explosion of affordable internet has democratized the Indian woman's lifestyle. From rural artisans selling jewelry on Instagram to "Mom-bloggers" sharing parenting tips on YouTube, digital spaces have become the new community squares.
The culture and lifestyle of Indian women cannot be reduced to a single narrative. It is a vibrant, shifting mosaic. She is the protector of tradition and the pioneer of change—equally comfortable reciting ancient shlokas as she is coding the next big app. Her story is one of resilience, adaptation, and an unwavering pride in her identity.
1. Possible to use ImageNet checkpoints?
We allow finetuning from public ImageNet checkpoints for the supervised track -- but a link to the specific checkpoint should be provided with each submission.
2. Possible to use optical flow?
Flow can be used as long as not trained on external datasets, except if they are synthetic.
3. Can we train on test data without labels (e.g. transductive)?
No.
4. Can we use semantic class label information?
Yes, for the supervised track.
5. Will there be special tracks for methods using fewer FLOPs / small models or just RGB vs RGB+Audio in the self-supervised track?
We will ask participants to provide the total number of model parameters and the modalities used and plan to create special mentions for those doing well in each setting, but not specific tracks.