• See more news on LinkedIn


  • We are happy to announce that we'll be sharing our deep learning, transformer, Large Language Models based NLP models soon on huggingface
    linkedin.com/posts/bigdata-lab


  • Another impressive research project focusing on the medical field was presented successfully during poster presentations by our senior students Günsu Bilge D., Can Karatepe, and Sinem Ceren Kontaş. We congratulate them on the outstanding work done.

    In the project titled "Information Extraction from Free-Text Radiology Reports Using Named Entity Recognition and Relation Extraction", our team aimed to address the challenges associated with the lack of standardized structure in radiology reports, hindering accessibility and efficient utilization. This limitation negatively affects decision-making and patient care, making it essential to find a solution for enhancing interoperability and enabling advanced analytics.

    In this research, we adopted a multi-faceted approach using deep learning and large language models to enhance information extraction from Turkish free-text radiology reports. We fine-tuned a BERT model and integrated it into DyGIE++ for named entity recognition (NER), improving understanding of the Turkish language within the radiology domain. Additionally, we trained a Seq2Seq model to simplify complex jargon in reports for better patient comprehension. Leveraging GPT-3.5, we augmented the dataset with multiple simplified versions of each report. The results showed promising improvements in information extraction and patient engagement. Our research lays the groundwork for further advancements in medical AI and NLP in the Turkish Radiology domain.

    We would like to express our appreciation to the advisory Radiologists group from Ankara Bilkent City Hospital, led by Assoc. Prof Ural Koc, radiology residents Ceren Aydin, Muhammet Batuhan Gokhan, and Ali Bahadir Ozdemir for their valuable input and support throughout this project. Their collaboration in providing a set of radiology reports and assisting with the annotation work was vital to the success of our research.

    Lastly, we extend our special thanks to NLP researchers Abubakar Ahmad Abdullahi and Gıyaseddin Bayrak from our lab for their significant contributions and assistance in the project. Their expertise and guidance were invaluable in achieving the impressive results we have obtained.

    Stay tuned for more innovative and transformative updates from our lab in the field of NLP and AI. If you have any questions or are curious about our project, please don't hesitate to reach out!
    linkedin.com/posts/bigdata-lab


  • Exciting research underway in the field of medical natural language processing! 🩺 Our project focuses on Named Entity Recognition (NER) in Turkish-language radiology reports. The goal is to develop and fine-tune language models tailored specifically for NER in this domain.

    NER plays a crucial role in identifying and classifying named entities like people, places, and medical terms. In our research project, we are dedicated to enhancing NER in Turkish-language radiology reports. To achieve this, we will fine-tune and test various language models, including BERT, GPT-2, GPT-3.5, and GPT-4, using a substantial corpus of Turkish radiology reports.

    By refining these language models, we aim to improve their performance on NER and explore the potential of advanced models for simplifying Turkish radiology reports. This research has the power to advance natural language processing in the medical domain, benefiting Turkish-speaking populations and ultimately leading to enhanced patient care and outcomes.

    Stay tuned for updates on our groundbreaking research in medical NLP for Turkish-language radiology reports!
    linkedin.com/posts/bigdata-lab


  • Exciting developments in the legal field! 📚 We are thrilled to showcase the outstanding work of our senior students Osman Erikci, Esin Belen, and Ahmed Hami Orak in their poster presentation. They are tackling the challenges within the legal system by leveraging cutting-edge technologies. In Turkey, the high caseload per judge in 2020—671 cases per judge—highlighted the need for #AI based #legaltechnology solutions to support their decisions. That's why our talented senior students have embarked on a groundbreaking project focused on creating a knowledge graph to help navigating through court decisions and similar information and pave the way for future advancements.

    A knowledge graph is a semantic graph structure that enables us to visualize and understand the connections and similarities between different fields of cases. By analyzing a dataset of 13,990 thesis documents in JSON format, containing valuable information like keywords, authors, and text, our senior students are utilizing the keywords field to construct the knowledge graph.

    Considering word frequencies, they made adjustments to efficiently extract keywords using regexes and replace functions. For text classification and representation learning, they harnessed the power of fastText, an efficient library capable of handling large datasets and out-of-vocabulary words. Additionally, they utilized a fine-tuned version of fastText @BIGDaTA_Lab , enabling them to better understand legal terms.

    Another essential tool in their project is BERT, a powerful language representation model based on transformer architecture. With its contextualized word embeddings, BERT achieves state-of-the-art performance in various NLP tasks. They employed the bert-extractive-summarizer 0.10.1 to summarize case law texts, enhancing efficiency and accessibility.

    We are proud of their dedication and innovative approach that have a potential to speed up the intensive search of legal professionals and thus making a significant impact on the legal system.
    linkedin.com/posts/bigdata-lab


  • Exciting news from our research lab! We are proud to share the successful poster presentation by our senior students Reyta Gül MURAN, Fatih Akgündüz, and Yüksel Ağagişi. Congratulations team for their outstanding work on the project titled "Specialized Question and Answer System for Turkish Law using NLP Algorithms."

    In the legal field, research can be an arduous task, demanding valuable time and resources from lawyers. That's why our project aims to revolutionize legal research by developing a specialized question and answer system tailored specifically for Turkish Law. Our @BIGDaTA_Lab's cutting-edge system employs advanced Natural Language Processing (NLP) algorithms to provide legal professionals with a simplified, cost-effective, and easily accessible tool for prompt and efficient legal inquiries.

    Throughout the project, our team explored various models, and the Bert2Bert model consistently outperformed the T5 model in terms of speed and response times across all iterations. With a higher F1 score, the Bert2Bert model strikes a remarkable balance between precision and recall, ensuring accurate and reliable answers for legal queries. Meanwhile, the T5 model demonstrated superior precision and recall.

    This project not only contributes to the development and implementation of question-answering systems in the Turkish legal domain but also lays the foundation for efficient analysis of Turkish legal texts by legal professionals and researchers. Our efforts are propelling advancements in Turkish NLP and AI research, revolutionizing the way legal research is conducted.

    We are excited about the future prospects of our project. Our next steps involve publishing a comprehensive publication. Additionally, we are committed to improving the speed and performance of our web application by implementing necessary enhancements and refining our model. To accomplish this, we will expand our dataset, making it pioneering and extensive in the field of Turkish Law Question-Answer.

    We would like to extend our heartfelt gratitude to Cihan Erdoğanyılmaz for his leading work and leadership in Q&A dataset preparation. His expertise and guidance have ensured the quality of our data while navigating the intricacies of legal regulations. We also want to express our sincere appreciation to our Master's student Batuhan Özdöl for his significant contributions in modelling throughout the project and to Rafah Alomar for providing valuable support during the data preparation phase.

    Stay tuned for more updates on our remarkable journey as we continue to innovate and transform the landscape of legal research in Turkish Law. If you're interested in learning more about our project or wish to collaborate, please don't hesitate to reach out.
    linkedin.com/posts/bigdata-lab


  • Wonderful poster presentation from our legal document retrieval / #searchengine team Mehmet Selman Baysan, Fatih Satı, and @merve hazal. Congratulations team 🎉. Their project focused on improving Turkish legal information retrieval through a cutting-edge "Semantic Search Engine in Legal Domain" has been announced.

    The project addresses the inefficiency of general-purpose search engines and keyword-based searches in the legal domain. It aims to enhance search efficiency and improve the quality of results by developing a domain-specific search engine for law.

    The project utilizes several powerful AI models including #fastText, #BERT, and several #sentencetransformer models, to improve the retrieval process and semantic understanding. By leveraging semantic search techniques, the engine can identify both exact matches and results that are semantically related to the query. The experimentation phase compared the performance of various algorithms, highlighting their strengths and weaknesses.

    Future plans include expanding the collection of Jurisprudential documents, implementing a Legislation Search Engine, enhancing the user interface, transforming the project into a market-ready product, implementing a serverless search engine on cloud platforms, and reaching a wider user base for valuable feedback to improve the #SBERT models and overall performance. The project expresses excitement about the possibilities and expresses gratitude for the support received with more updates to come. Interested individuals are encouraged to reach out to learn more about the project.
    linkedin.com/posts/bigdata-lab


  • 📢 Introducing our project: "Automatic/Semiautomatic Text Generation in Law"! 📜 ⚖ by Emine Çığ, Özgür Taylan, Berkin P.. Special thanks to Cihan Erdoğanyılmaz for his most valuable guidance with his domain expertise.

    Problem:
    With the increasing number of lawsuits, lawyers face a daunting challenge of preparing petitions and legal documents that often follow similar formats but vary in case descriptions. This repetitive and time-consuming process demands significant effort.

    Solution:
    We are proud to present a novel petition generation tool designed specifically for lawyers as a decision support system. Our tool harnesses the power of language models to streamline the process.

    Methodology:
    * Semantic Search:
    Our approach utilizes the SentenceTransformer model to calculate embeddings of statements in the dataset and the user's input statement. By employing cosine similarity, we identify the top 5 semantically similar petitions from the dataset based on the user's input. This enables us to recommend relevant petitions to the user effectively.
    * Transfer Learning using Transformers:
    To enhance the performance of our tool, we leveraged transfer learning techniques. We split the dataset into training, validation, and testing sets, employing the google/mt5-small model as our base model. Through fine-tuning and training for 19 epochs, we obtained impressive results.

    Experiments & Results:
    Through rigorous experimentation, we achieved a ROUGE1 score of 0.49, indicating the successful retrieval of similar petitions using our semantic search methodology.

    Key Features:
    At the conclusion of this project, we developed a Petition Template Generation Tool with two essential features:
    * Petition Description Generation Module: This module generates a descriptive content based on the client's statement, facilitating the preparation of accurate and tailored petitions.
    * Recommendation Module: By leveraging our semantic search capabilities, this module retrieves a collection of petitions similar to the client's statement, providing valuable references and insights.

    We're thrilled to contribute to the legal field by providing AI based decision support and simplifying the process of text generation in law. Stay tuned for more updates on this exciting project!
    linkedin.com/posts/bigdata-lab


  • We are excited to share the results of our recent study in medical domain on the use of deep learning to detect brain hemorrhage in CT reports. Brain hemorrhage is a serious medical condition that can be life-threatening. Early detection and treatment are essential for improving patient outcomes. In our study, we trained a deep learning classifier to detect brain hemorrhage from CT reports. We used a large dataset of Turkish radiology reports, and we fine-tuned the classifier using domain-specific data. Our results showed that the deep learning classifier was able to detect brain hemorrhage with an accuracy of 72%. Our results suggest that deep learning can be used to improve the detection of brain hemorrhage in CT reports. This could lead to earlier diagnosis and treatment, which could improve patient outcomes. Reference: Bayrak, G., Toprak, M. Ş., Ganiz, M. C., Kodaz, H., & Koç, U. (2022). Deep learning-based brain hemorrhage detection in ct reports. In Challenges of Trustable AI and Added-Value on Health (pp. 866-867). IOS Press. DOI: 10.3233/SHTI220609
    Link: lnkd.in/d3D4DGab
    linkedin.com/posts/bigdata-lab


  • We are excited to share the results of our recent study on a simple data augmentation method to improve the performance of named entity recognition (NER) models in the medical domain. Named entity recognition is the task of identifying and classifying named entities in text, such as people, organizations, and locations. NER is a critical task in many biomedical applications, such as clinical text mining and drug discovery. In our study, we applied a simple data augmentation method called Easy Data Augmentation (EDA) to a medical NER dataset. EDA consists of four basic methods: synonym replacement, random insertion, random deletion, and random swap. We found that EDA significantly improved the performance of NER models on the medical dataset. In particular, we achieved an F1 score of 93.4%, which is a significant improvement over the baseline F1 score of 88.2%. Our results suggest that EDA is a simple and effective method for improving the performance of NER models in the medical domain. We believe that this work will contribute to the development of more accurate and reliable biomedical NER models. Reference: Issifu, A. M., & Ganiz, M. C. (2021). A simple data augmentation method to improve the performance of named entity recognition models in medical domain. In 6th International Conference on Computer Science and Engineering (UBMK) (pp. 763-768). IEEE. DOI: 10.1109/UBMK52708.2021.9558986
    Link: lnkd.in/d8-EZY6Z
    linkedin.com/posts/bigdata-lab