Posts by Collection

arblogs

Augmented Reality and User Perspective Rendering

5 minute read

Published:

Till the last week, the AR application, that we developed consisted of face detection in 2D along with AR object rendering in parallel. However, there was no interaction between the two separate applications because of communication issues between unity and android.

User Perspective Rendering approaches

3 minute read

Published:

The app that was required to be constructed was highly responsive and susceptible to even the small movements of the head. The movements of the augmented object was not proper and was highly irregular, hence, I made the movements regular so that it is perceptible and more concrete.

ards

ARDS Daily Updates

less than 1 minute read

Published:

2022 August 29-30

  • Completed CITI and HIPAA Certifications

cerebralEdemaDetection

Cerebral Edema Detection Daily Updates

2 minute read

Published:

2022 August 30

  • Completed CITI and HIPAA Certifications
  • Working on https://johnmuschelli.com/process_head_ct/example/# with lung images
  • ichseg package is not working for stripping. Trying to find an alternative around that.
  • Could not find any brain CT images dataset which is in DICOM format

covid19

cs7641ml

dsa

Rotate Array - Medium

1 minute read

Published:

Question

  • https://leetcode.com/problems/rotate-array/

H-Index - Medium - L

less than 1 minute read

Published:

Question

  • https://leetcode.com/problems/h-index/

hatespeechblogs

Annotation Idea Guidelines

2 minute read

Published:

With the emergence of a variety of social media platforms, and the freedom to express one’s thought, sadly, there is a lot of hateful content available on social media. Some platforms like Twitter filter out any posts which involve abusive and highly provocative language. However, Gab is a platform where freedom of speech is retained. Thus, hate content can be easily found on Gab. It becomes important to analyze the data, posts and comments. Hate Speech detection thus, plays an important role in identifying any kind of trend, troll, threat, etc:

Annotation Guidelines Refinement

2 minute read

Published:

To concretely define and come up with the approach for classification, it is required to think of the best architecture and techniques so as to beat the state of the art. Hence, I explored a lot of literature regarding the same which concerns with the newest approaches. Hence, I read the following papers :

icarcnr

Literature Survey on Adversarial Attacks and their defense

9 minute read

Published:

With the coming up of so many applications on which deep learning is proving to be impactful with high accuracies and precision. it is important to ensure it’s safety and security against adversarial attacks. It has been observed that deep neural networks are susceptible to adversarial attacks even in the form of small perturbations which are not conceivable by humans. My literature survey on this topic consists of the following papers and their details :

Nazi Element Classification

1 minute read

Published:

The task of last week was to make a dataset for nazism element detection. I was provided with the positive examples, and I had to generate the negative ones. The initial given dataset consisted of around 2800 images in total belonging to various categories like :

  • 88_heil_hitler
  • nazi_eagle
  • nazi_swastikas
  • blut_und_ehre
  • nazi_flags
  • nazi_tattoo
  • crossed_grenade_emblem_nazism
  • nazi_parade
  • schwarze_sonne
  • hitler_salute
  • nazi_party
  • sieg_heil
  • meine_ehre_heisst_treue
  • nazi_propaganda
  • ss_death’s_head
  • nazi_bolts
  • nazi_rally
  • ss_iron_crosses

Semi-supervised learning using varitional autoencoders results

less than 1 minute read

Published:

The model trained on data size of 5714 and 36 classes didn’t perform that good. I tried with approaches based on unlabeled data as well as fully labeled data, however, the accuracy still remained low. The graphs are attached as follows :

multimodal

Self-Supervised Multimodal Opinion Summarization

1 minute read

Published:

Opinion summarization is the task of automatically generating summaries from multiple documents containing users’ thoughts on businesses or products. This summarization of users’ opinions can provide information that helps other users with their decision-making on consumption.

Multi-Modal Supplementary-Complementary Summarization using Multi-Objective Optimization

2 minute read

Published:

When dealing with multi-modal information retrieval tasks, the extent to which a particular modality contributes to the final output might differ from other modalities. Amongst the modalities, there is often a preferable mode of representation based on the significance and ability to fulfill the task. We denote these preferred modalities as key modalities or central modalities (will be referred to as central modalities from here onwards). The other modalities help assist the central modalities in fulfilling the desired task, and are known as adjacent modalities. The adjacent modalities can enhance the user experience by either supplementing or by complementing the information represented via the central modality. When these adjacent modalities reinforce the facts and ideas presented in central modality, the enhancement is known as supplementary enhancement. On the other hand, when these adjacent modalities complete the central modality, by providing additional or alternate information that is relevant, albeit not covered by the central modality, the enhancement is known as complementary enhancement.

Multimodal Sentence Summarization via Multimodal Selective Encoding

1 minute read

Published:

Li et. al proposed a hierarchical attention model for the multimodal sentence summarization task, while the image is not involved in the process of text encoding. Obviously, it will be easier for the decoder to generate an accurate summary if the encoder can filter out trivial information when encoding the input sentence. Based on this idea, paper proposes a multimodal selective mechanism which aims to select the highlights from the input text using visual signals, and then the decoder generates the summary using the filtered encoding information. Concretely, an encoder reads the input text and generates the hidden representations. Then, multimodal selective gates measure the relevance between the input words and the image to construct the selected hidden representation. Finally, a decoder generates the summary using the selected hidden representation.

Multistage Fusion with Forget Gate for Multimodal Summarization in Open-Domain Videos

1 minute read

Published:

Multimodal summarization for open-domain videos is an emerging task, aiming to generate a summary from multisource information (video, audio, transcript). Despite the success of recent multiencoder-decoder frameworks on this task, existing methods lack finegrained multimodality interactions of multisource inputs. Besides, unlike other multimodal tasks, this task has longer multimodal sequences with more redundancy and noise. To address these two issues, the paper proposed a multistage fusion network with the fusion forget gate module, which builds upon this approach by modeling fine-grained interactions between the multisource modalities through a multistep fusion schema and controlling the flow of redundant information between multimodal long sequences via a forgetting module.

VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles

1 minute read

Published:

In real-world applications, the input is usually a video consisting of hundreds of frames. Consequently, the temporal dependency in a video cannot be simply modeled by static encoding methods. Hence, in this work, Video-based Multimodal Summarization with Multimodal Output (VMSMO) is proposed, which selects cover frame from news video and generates textual summary of the news article in the meantime

Multi-modal Summarization for Video-containing Documents

1 minute read

Published:

Existing models suffer from the following drawbacks:

  • Most existing applications extract visual information from the accompanying images, but they ignore related videos. The paper contends that videos contain abundant contents and have temporal characteristics where events are represented chronologically, which are crucial for text summarization.
  • Although attention mechanism and early fusion are used extensively, it adversely introduces noise as it is unsuitable for multi-modal data without alignment, which is characterized by a large gap that requires intensive communication.
  • Various multi-modal summarization works have focused on a single task, such as text or video summarization with added information from other modalities. Paper observes that both summarization tasks share the same target of refining original long materials, and as such they can be performed jointly due to common characteristics.

Convolutional Hierarchical Attention Network for Query-Focused Video Summarization

2 minute read

Published:

There are three differences between queryfocused video summarization and generic video summarization :

  • Firstly, the video summary needs to take the subjectivity of users into account, as different user queries may receive different video summaries.
  • Secondly, trained video summarizers cannot meet all the users’ preferences and the performance evaluation is often to measure the temporal overlap, makes it hard to capture the semantic similarity between summaries and original videos.
  • Thirdly, the textual query will bring additional semantic information to the task.

Aspect-Aware Multimodal Summarization for Chinese E-Commerce Products

1 minute read

Published:

Commercial product advertisements, as a critical component of marketing management in e-commerce platforms, aim to attract consumers’ interests and arouse consumers’ desires to purchase the products. However, most product advertisements are so miscellaneous and tedious that the consumers cannot be expected to be patient enough to carefully read through them.

Multimodal Summarization of Complex Sentences

1 minute read

Published:

This paper introduces ROCMMS, a system that automatically converts existing text to multimodal summaries (MMS) that capture the meaning of a complex sentence in a diagram containing pictures and simplified text related by structure extracted from the original sentence.

CLIP : Connecting Text and Images

less than 1 minute read

Published:

This paper is trained on a wide variety of images with a wide variety of natural language supervision that’s abundantly available on the internet. By design, the network can be instructed in natural language to perform a great variety of classification benchmarks, without directly optimizing for the benchmark’s performance.

nvs

Novel View Synthesis for human drawn sketches

1 minute read

Published:

Till now, I researched through a lot of papers and were working with autoencoders to transform the canny edge images from one viewpoint to the other. This approach however doesn’t work on doodle based very lowly abstracted sketches which are more commonly drawn by humans. We tried cyclegan for this approach where we could get the required abstraction to the canny images, however, that approach didn’t give any fruitful results.

papers

portfolio

Hate Speech Degree Detection on English Data - Blog Post 1

Published:

Hate Speech Degree Detection on English Data

With the emergence of a variety of social media platforms, and the freedom to express one’s thought, sadly, there is a lot of hateful content available on social media. Some platforms like Twitter filter out any posts which involve abusive and highly provocative language. However, Gab is a platform where freedom of speech is retained. Thus, hate content can be easily found on Gab. It becomes important to analyze the data, posts and comments. Hate Speech detection thus, plays an important role in identifying any kind of trend, troll, threat, etc:

publications

talks

Talk 1 on Relevant Topic in Your Field

Published in UC San Francisco, Department of Testing, 2012

This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!

teaching

Teaching experience 1

Published in University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Published in University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.