Berat Kurar-Barakat

I hold a PhD degree in Computer Science from Ben-Gurion University. My PhD advisor was Prof. Jihad El-Sana. I have worked at his Visual Media Lab for five years on historical document image analysis using digital image processing and machine learning methods.

I am presently a postdoctoral fellow in the Blavatnik School of Computer Science at Tel-Aviv University where I am hosted by Prof. Nachum Dershowitz. Our research focuses on document image analysis of Dead Sea Scrolls.

Home     Teaching     Google Scholar     GitHub     DBLP    


Datasets and Results
Qumran Segmentation Dataset (QSD) benchmarks the segmentation of ink and parchment regions in Dead Sea Scrolls fragment images. The dataset contains images of 20 fragments, including full-color images, first-band and last-band multispectral images, and normalized last-band images. All images are cropped to focus on the fragment and rice paper, excluding color bars, rulers, and plate numbers. Pixel values range from 0–255 for JPEGs and 0–65535 for TIFFs.

Explore the segmentation results or read the paper.

VML-HTR dataset: Handwritten Text Recognition dataset consists of five Arabic manuscripts annotated in subword level and text line level. Each manuscript contains 27000-35000 subwords and 1400-2000 text lines. Further statistics are available at here. Line level annotation is automatically generated using the subword level annotation from the VML-HD dataset. The ground truth is available in PAGE-XML format.
VML-HP dataset: Hebrew Paleography dataset consists of 537 document page images with labels of 15 script sub-types. Ground truth is manually created by a Hebrew paleographer at a page level. In addition, we propose a patch generation tool for extracting patches that contain an approximately equal number of text lines no matter the variety of fontsizes. The VML-HP dataset contains a train set and two test sets. The first is a typical test set, and the second is a blind test set for evaluating algorithms in a more challenging setting.
HHD dataset: Hebrew Handwritten dataset consists of natural handwritten Hebrew character images extracted from pangram paragraphs. Hence the dataset contains 26 classes balanced in terms of number of samples. Train set contains 3965 samples, test set contains 1134 samples.
VML-AHTE dataset: Arabic Handwritten Text Line Extraction dataset is a natural handwritten benchmark dataset for text lines with crowdy diacritics, touching and overlapping characters. It is fully labeled at line level by native Arabic speakers. The dataset contains 20 training pages and 10 test pages. Every document image has a corresponding ground truth in the form of pixel labels and PAGE xml.
The Pinkas dataset is a public historical document image dataset. It is the first dataset in medieval handwritten Hebrew and fully labeled at word, line and page level by an expert of historical Hebrew manuscripts.
VML-MOC: Multiply oriented and curved handwritten text line dataset is a natural handwritten benchmark dataset for heavily skewed and curved text lines. These text lines are side notes added by scholars over the years on the page margins each time with a different orientation or sometimes in an extremely curvy form due to space constraints. The dataset contains 20 training pages and 10 test pages. Every document image has a corresponding ground truth in the form of pixel labels and PAGE xml.
Challenging text line dataset contains 30 pages from two different manuscripts. It is written in Arabic language and contains 2732 text lines where a considerable amount of them are multidirected, multi-skewed or curved. Ground truth where text lines were labeled manually by line masks, is also available in the dataset.
Complex layout dataset contains 32 document images from 2 manuscripts which were scanned at a private library located at the old city of Jerusalem and other samples which were collected from the Islamic manuscripts digitization project at Leipzig university library.
Publications
Segmentation of Ink and Parchment in Dead Sea Scroll Fragments
Berat Kurar-Barakat and Nachum Dershowitz
Submitted to 2025 ICDAR - IJDAR Journal Track
[code] [QSD]
Computational Paleography of Medieval Hebrew Scripts
Berat Kurar-Barakat, Daria Vasyutinsky-Shapira, Sharva Gogawale, Mohammad Suliman and Nachum Dershowitz
2024 Computational Humanities Research Conference (CHR2024)
[poster] [slides] [code]
MiDRASH -- A Project for Computational Analysis of Medieval Hebrew Manuscripts
Daria Vasyutinsky-Shapira, Berat Kurar-Barakat, Sharva Gogawale, Mohammad Suliman and Nachum Dershowitz
2024 Eurographics Workshop on Graphics and Cultural Heritage (GCH2024)
[poster] [code]
Computational Tools for Dead Sea Scrolls
Berat Kurar-Barakat and Nachum Dershowitz
Preprint
[code]
Computational Qumranic Paleography
Berat Kurar-Barakat and Nachum Dershowitz
2024 ICDAR Workshop on Computational Paleography (IWCP2024)
[slides] [code]
Automatic Clustering of Hebrew Manuscripts
Daria Vasyutinsky-Shapira, Berat Kurar-Barakat, Mohammad Suliman, Sharva Gogawale and Nachum Dershowitz
2024 Digital Research Infrastructure for the Arts and Humanities (DARIAH2024)
[slides] [code]
Clustering Ashkenazi Manuscript
Berat Kurar-Barakat, Mohammad Suliman, Sharva Gogawale, Daria Vasyutinsky-Shapira and Nachum Dershowitz
2024 Digital Humanities Conference (DH2024)
[poster] [code]
Transcending Traditional Paleography through Computational Analysis
Berat Kurar-Barakat, Daria Vasyutinsky-Shapira, Sharva Gogawale, Mohammad Suliman and Nachum Dershowitz
2024 Israel Data Science and AI Initiative Conference (IDSAI2024)
[poster] [code]
Prediction of Paleographical Features in Ashkenazi Square Script for Identifying Subclusters Within the Style
Daria Vasyutinsky-Shapira, Berat Kurar-Barakat, Mohammad Suliman, Sharva Gogawale and Nachum Dershowitz
2024 Association for Jewish Studies (AJS2024)
[code]
NetLay: Layout Classification Dataset for Enhancing Layout Analysis
Sharva Gogawale, Luigi Bambaci, Berat Kurar-Barakat, Daria Vasyutinsky-Shapira, Daniel Stökl Ben Ezra and Nachum Dershowitz
2024 Magazén International Journal for Digital and Public Humanities
[code]
Segmenting Dead Sea Scroll Fragments for a Scientific Image Set
Bronson Brown-deVost, Berat Kurar-Barakat and Nachum Dershowitz
2023 Preprint
[code]
Text Line Extraction in Historical Documents Using Mask R-CNN
Ahmad Droby, Berat Kurar-Barakat, Reem Alaasam, Boraq Madi, Irina Rabaev, Jihad El-Sana
2022 Signals Journal, MDPI
Digital Hebrew Paleography: Script Types and Modes
Ahmad Droby, Daria Vasyutinsky Shapira, Irina Rabaev, Berat Kurar-Barakat, Jihad El-Sana
2022 Journal of Imaging, MDPI
Hard and Soft Labeling for Hebrew Paleography: A Case Study
Ahmad Droby, Irina Rabaev, Daria Vasyutinsky Shapira, Berat Kurar-Barakat, Jihad El-Sana
2022 13th IAPR International Workshop on Document Analysis Systems (DAS)
Is a deep learning algorithm effective for the classification of medieval Hebrew scripts?
Daria Vasyutinsky Shapira, Irina Rabaev, Ahmad Droby, Berat Kurar-Barakat and Jihad El-Sana
Jewish Studies in the Digital Age (DH2022)
[code]
Unsupervised learning of text line segmentation by differentiating coarse patterns
Berat Kurar-Barakat, Ahmad Droby, Raid Saabni, and Jihad El-Sana
2021 International Conference on Document Analysis and Recognition (ICDAR)
[slides] [code]
VML-HP: Hebrew paleography dataset
Ahmad Droby, Berat Kurar-Barakat, Daria Vasyutinsky Shapira, Irina Rabaev, Jihad El-Sana
2021 International Conference on Document Analysis and Recognition (ICDAR)
[code]
Learning-Free Text Line Segmentation for Historical Handwritten Documents
Berat Kurar-Barakat, Rafi Cohen, Ahmad Droby, Irina Rabaev and Jihad El-Sana
2021 Applied Sciences Journal, MDPI
[code]
Deep learning for paleographic analysis of medieval Hebrew manuscripts: a DH team collaboration experience
Daria Vasyutinsky, Irina Rabaev, Berat Kurar-Barakat, Ahmad Droby, and Jihad El-Sana
2020 Twin Talks: Understanding and Facilitating Collaboration in Digital Humanities
[slides] [code]
Unsupervised deep learning for text line segmentation
Berat Kurar-Barakat, Ahmad Droby, Rym Alasam, Boraq Madi, Irina Rabaev, Raed Shammes and Jihad El-Sana
2020 25th International Conference on Pattern Recognition (ICPR)
[code] [poster]
Text line extraction using fully convolutional network and energy minimization
Berat Kurar-Barakat, Ahmad Droby, Rym Alasam, Boraq Madi, Irina Rabaev, and Jihad El-Sana
2020 2nd International Workshop on Pattern Recognition for Cultural Heritage (PatReCH)
[slides] [code]
Unsupervised deep learning for handwritten page segmentation
Ahmad Droby, Berat Kurar-Barakat, Boraq Madi, Rym Alasam, and Jihad El-Sana
2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR)
[code]
The HHD dataset
Irina Rabaev, Berat Kurar-Barakat, Alexendar Churkin, and Jihad El-Sana
2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR)
[code] [poster]
The pinkas dataset
Berat Kurar-Barakat, Irina Rabaev, and Jihad El-Sana
2019 International Conference on Document Analysis and Recognition (ICDAR)
[poster] [code]
VML-MOC: Segmenting a multiply oriented and curved handwritten text lines dataset
Berat Kurar-Barakat, Rafi Cohen, Irina Rabaev, and Jihad El-Sana
2019 3rd International Workshop on Arabic and derived Script Analysis and Recognition (ASAR)
[slides] [code]
Layout analysis on challenging historical Arabic manuscripts using siamese network
Reem Alaasam, Berat Kurar-Barakat, and Jihad El-Sana
2019 International Conference on Document Analysis and Recognition (ICDAR)
[poster] [code]
Text Line Segmentation for Challenging Handwritten Document Images using Fully Convolutional Network
Berat Kurar-Barakat, Ahmad Droby, Majeed Kassis, and Jihad El-Sana
2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR)
[poster] [code]
Word Spotting Using Convolutional Siamese Network
Berat Kurar-Barakat, Reem Alaasam, and Jihad El-Sana
2018 13th IAPR International Workshop on Document Analysis Systems (DAS)
[poster] [pitch] [code]
Binarization Free Layout Analysis for Arabic Historical Documents Using Fully Convolutional Networks
Berat Kurar-Barakat and Jihad El-Sana
2018 2nd International Workshop on Arabic Script Analysis and Recognition (ASAR)
[slides] [code]
Case Study Fine Writing Style Classification Using Siamese Neural Network
Alaa Abdalhaleem, Berat Kurar-Barakat, and Jihad El-Sana
2018 2nd International Workshop on Arabic Script Analysis and Recognition (ASAR)
[slides]
Synthesizing versus Augmentation for Arabic Word Recognition with Convolutional Neural Networks
Reem Alaasam, Berat Kurar-Barakat, and Jihad El-Sana
2018 2nd International Workshop on Arabic Script Analysis and Recognition (ASAR)
Experiment study on utilizing convolutional neural networks to recognize historical Arabic handwritten text
Reem Alaasam, Berat Kurar-Barakat, and Jihad El-Sana
2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR)

Academic writings
Progress Report and Future Plans for Postdoctoral Research
Berat Kurar-Barakat and Nachum Dershowitz
11.07.2024
Midrash Ashkenazi Square script clustering progress report
Berat Kurar-Barakat, Sharva Gogawale, Mohammad Suliman, Daria Vasyutinsky-Shapira and Nachum Dershowitz
29.01.2024
Midrash Ashkenazi Square script clustering progress report
Berat Kurar-Barakat, Sharva Gogawale, Mohammad Suliman, Daria Vasyutinsky-Shapira and Nachum Dershowitz
12.12.2023
Low level tasks for Digital Humanities
Berat Kurar-Barakat and Nachum Dershowitz
11.11.2023
Fragment segmentation and Kraken letter detection enhancement
Berat Kurar-Barakat and Nachum Dershowitz
24.07.2023
Computational Qumranic Paleography
Berat Kurar-Barakat and Nachum Dershowitz
11.07.2023
Computational Qumranic Paleography
Berat Kurar-Barakat and Nachum Dershowitz
21.06.2023
Letter-level paleography for DSS
Berat Kurar-Barakat and Nachum Dershowitz
20.06.2023
eScriptorium comparison
Berat Kurar-Barakat and Nachum Dershowitz
12.09.2022
Improving Kraken's character segmentation using Energy Minimization
Berat Kurar-Barakat and Nachum Dershowitz
[code]
01.08.2022
A thousand word images are worth a word
Berat Kurar-Barakat, Tan Lu, and Ann Dooms
25.11.2021

Honors
ICFHR 2018 Competition on Recognition of Historical Arabic Scientific Manuscripts
Winner Team of Page Segmentation Track
Berat Kurar-Barakat, Ahmad Droby, and Jihad El-Sana
[results] [code]
ASAR 2018 Layout Analysis Competition
Winner Team of Classification Track
Ahmad Droby, Berat Kurar-Barakat, and Jihad El-Sana
[results] [code]