Hans Uszkoreit

Hans Uszkoreit

Hans Uszkoreit is a German computational linguist. Hans Uszkoreit studied Linguistics and Computer Science at Technische Universität Berlin and the University of Texas at Austin. While he was studying in Austin, he also worked as a research associate in a large machine translation project at the Linguistics Research Center. After he received his Ph.D. in linguistics from the University of Texas, he worked as a computer scientist at the Artificial Intelligence Center and was affiliated with the Center for the Study of Language and Information at Stanford University. Nowadays, he is teaching as a professor of Computational Linguistics at Saarland University. Moreover, he serves as a Scientific Director at the German Research Center for Artificial Intelligence (DFKI) where he heads the DFKI Language Technology Lab. == Life and career == Hans Uszkoreit, a native of East Berlin, was actively involved in a group of young individuals who opposed the East Germany regime. His protesting against the 1968 invasion of Czechoslovakia led to his expulsion from high school and subsequent imprisonment for a period of fifteen months on charges of subversive agitation. Realizing that continuing his education in East Germany was not feasible, Uszkoreit made the decision to escape to West Berlin. There, he completed his high school education and pursued a degree in Linguistics and Computer Science at Technische Universität Berlin. During his time as a student, he worked part-time as an editor and writer for Zitty, a city magazine, which he co-founded. In 1977, Uszkoreit was granted a Fulbright Grant to further his studies at the University of Texas at Austin. During his time in Austin, he concurrently served as a research associate in a significant machine translation project. Subsequently, he received a second Fulbright grant, which enabled him to pursue a Ph.D. program in linguistics. In 1984, he successfully completed his doctoral studies, earning a Ph.D. in linguistics. Between 1982 and 1986, Uszkoreit held the position of a computer scientist at the Artificial Intelligence Center of SRI International in Menlo Park, California. In 1988, he created the Department of Computational Linguistics and Phonetics at Saarland University. In 1989 he was elected head of the Language Technology Lab at DFKI. In 2012, Uszkoreit's achievements in the domain of relation extraction led to his receipt of a Google Faculty Research Award, acknowledging the substantial progress made by Uszkoreit and his team in advancing the field. In 2013, Uszkoreit, in collaboration with Feiyu Xu and Roberto Navigli, was granted an additional Google Research Award, which provided support for a targeted project within Google's Language Understanding Program, focusing on the augmentation of language comprehension and analysis. == Personal life == He is father of a son Jakob Uszkoreit, machine learning researcher scientist, an author of the landmark paper "Attention Is All You Need", and daughter Lena Uszkoreit. == Awards == 2002 Elected Member of the European Academy of Sciences 2012 Google Faculty Research Award 2013 Google Focused Research Award

Trigram

Trigrams are a special case of the n-gram, where n is 3. They are often used in natural language processing for performing statistical analysis of texts and in cryptography for control and use of ciphers and codes. See results of analysis of "Letter Frequencies in the English Language". == Frequency == Context is very important, varying analysis rankings and percentages are easily derived by drawing from different sample sizes, different authors; or different document types: poetry, science-fiction, technology documentation; and writing levels: stories for children versus adults, military orders, and recipes. Typical cryptanalytic frequency analysis finds that the 16 most common character-level trigrams in English are: Because encrypted messages sent by telegraph often omit punctuation and spaces, cryptographic frequency analysis of such messages includes trigrams that straddle word boundaries. This causes trigrams such as "edt" to occur frequently, even though it may never occur in any one word of those messages. == Examples == The sentence "the quick red fox jumps over the lazy brown dog" has the following word-level trigrams: the quick red quick red fox red fox jumps fox jumps over jumps over the over the lazy the lazy brown lazy brown dog And the word-level trigram "the quick red" has the following character-level trigrams (where an underscore "_" marks a space): the he_ e_q _qu qui uic ick ck_ k_r _re red

Ziad Obermeyer

Ziad Obermeyer (Arabic: زياد أوبرماير) is a Lebanese American physician and researcher whose work focuses on machine learning, health policy, and clinical decision-making in medicine. He is the Blue Cross of California Distinguished Associate Professor at the UC Berkeley School of Public Health, a Chan Zuckerberg Biohub investigator, and a research associate at the National Bureau of Economic Research. He is known for his research on racial bias in health care algorithms and the use of artificial intelligence in health care. == Early life and education == Obermeyer was born in Beirut, Lebanon, and raised in Cambridge, Massachusetts. He earned a Bachelor of Arts degree from Harvard College and a Master of Philosophy (M.Phil.) in History and Science from the University of Cambridge. He received his Doctor of Medicine (M.D.) from Harvard Medical School in 2008. Before pursuing medicine, Obermeyer worked as a consultant at McKinsey & Company, advising pharmaceutical and global health clients in New Jersey, Geneva, and Tokyo. After completing his medical degree, he trained as an emergency physician at Mass General Brigham (MGB) in Boston, Massachusetts. He later continued practicing emergency medicine at the Fort Defiance Indian Hospital on the Navajo Nation in Arizona. == Academic career == Obermeyer served as an Assistant Professor at Harvard Medical School from 2014 to 2020. In 2020, he joined the University of California, Berkeley as an Associate Professor and the Blue Cross of California Distinguished Professor at the School of Public Health. == Research focus == === Algorithmic racial bias in healthcare === In 2019, Obermeyer and economist Sendhil Mullainathan examined a commercial healthcare algorithm by UnitedHealth Group, used in hospitals and by insurers to identify patients with complex health needs. The study found that the algorithm underestimated the health needs of Black patients compared to white patients with similar conditions and that reformulating it would reduce racial bias. In 2020, Obermeyer analyzed an algorithm used to allocate CARE Act relief funding to hospitals. The study identified allocation patterns that favored hospitals with higher revenues over hospitals serving larger numbers of COVID-19 patients who are predominantly Black. === Clinical decision-making === In 2021, Obermeyer and colleagues examined physician decision-making in cardiac care using machine learning models. The study found that physicians misdiagnose cases when they rely on symptoms representative of a heart attack, such as chest pain, over other symptoms. === Pain assessment === Obermeyer developed a deep learning approach to investigate the severity of osteoarthritis in underserved communities. == Policy and regulatory work == Following the publication of the 2019 algorithmic racial bias study, the New York Department of Financial Services and Department of Health launched an investigation into UnitedHealth Group's algorithm, requesting that the company cease using it, citing discriminatory business practices. Also related to this study, in December 2019, Democratic Senators Cory Booker and Ron Wyden released letters to the Federal Trade Commission and Centers for Medicare and Medicaid Services asking to investigate potential discrimination in decision-making algorithms against marginalized communities in healthcare. The senators also wrote to major healthcare companies, including Aetna and Blue Cross Blue Shield, about their internal safeguards against racial bias in their technology. In 2021, Obermeyer and colleagues at the University of Chicago Booth School of Business released the Algorithmic Bias Playbook, a resource for policymakers and technical teams working in healthcare on how to measure and mitigate algorithmic racial bias. Obermeyer testified before the U.S. Senate Financial Committee in February 2024 on artificial intelligence in healthcare, recommending transparency requirements for AI developers and independent algorithm evaluations. In December 2025, he testified before the United States House Committee on Oversight and Government Reform on the role of AI in affordable healthcare and the impact of its integration on the workforce. == Organizations == In 2021, Obermeyer cofounded Nightingale Open Science, a non-profit that creates new medical imaging datasets available for research, and Dandelion Health, a health data analytics company. In June 2023, the company launched a program to audit and evaluate the performance of algorithms to identify potential racial, ethnic, and geographic bias, funded by the Gordon and Betty Moore Foundation and the SCAN Foundation. Dandelion Health partnered with the American Heart Association in 2025 to power an AI assessment lab for cardiovascular algorithms. Obermeyer is a founding faculty member of the University of California, Berkeley–University of California, San Francisco joint program in computational precision health. == Recognition == TIME magazine named Obermeyer one of the 100 most influential people in artificial intelligence in 2023. He has served as a Chan Zuckerberg Biohub Investigator since 2022, and as a Research Associate at the National Bureau of Economic Research since 2023. He was designated an Emerging Leader by the National Academy of Medicine in 2020. Obermeyer's racial bias study received the Willard G. Manning Memorial Award for the Best Research in Health Econometrics from the American Society of Health Economists (ASHEcon) in 2021 and the Responsible Business Education Award from the Financial Times in 2022.

Optical braille recognition

Optical braille recognition is technology to capture and process images of braille characters into natural language characters. It is used to convert braille documents for people who cannot read them into text, and for preservation and reproduction of the documents. == History == In 1984, a group of researchers at the Delft University of Technology designed a braille reading tablet, in which a reading head with photosensitive cells was moved along set of rulers to capture braille text line-by-line. In 1988, a group of French researchers at the Lille University of Science and Technology developed an algorithm, called Lectobraille, which converted braille documents into plain text. The system photographed the braille text with a low-resolution CCD camera, and used spatial filtering techniques, median filtering, erosion, and dilation to extract the braille. The braille characters were then converted to natural language using adaptive recognition. The Lectobraille technique had an error rate of 1%, and took an average processing time of seven seconds per line. In 1993, a group of researchers from the Katholieke Universiteit Leuven developed a system to recognize braille that had been scanned with a commercially available scanner. The system, however, was unable to handle deformities in the braille grid, so well-formed braille documents were required. In 1999, a group at the Hong Kong Polytechnic University implemented an optical braille recognition technique using edge detection to translate braille into English or Chinese text. In 2001, Murray and Dais created a handheld recognition system, that scanned small sections of a document at once. Because of the small area scanned at once, grid deformation was less of an issue, and a simpler, more efficient algorithm was employed. In 2003, Morgavi and Morando designed a system to recognize braille characters using artificial neural networks. This system was noted for its ability to handle image degradation more successfully than other approaches. == Challenges == Many of the challenges to successfully processing braille text arise from the nature of braille documents. Braille is generally printed on solid-color paper, with no ink to produce contrast between the raised characters and the background paper. However, imperfections in the page can appear in a scan or image of the page. Many documents are printed inter-point, meaning they are double-sided. As such, the depressions of the braille of one side appear interlaid with the protruding braille of the other side. == Techniques == Some optical braille recognition techniques attempt to use oblique lighting and a camera to reveal the shadows of the depressions and protrusions of the braille. Others make use of commercially available document scanners.

Weighted automaton

In theoretical computer science and formal language theory, a weighted automaton or weighted finite-state machine is a generalization of a finite-state machine in which the edges have weights, for example real numbers or integers. Finite-state machines are only capable of answering decision problems; they take as input a string and produce a Boolean output, i.e. either "accept" or "reject". In contrast, weighted automata produce a quantitative output, for example a count of how many answers are possible on a given input string, or a probability of how likely the input string is according to a probability distribution. They are one of the simplest studied models of quantitative automata. The definition of a weighted automaton is generally given over an arbitrary semiring R {\displaystyle R} , an abstract set with an addition operation + {\displaystyle +} and a multiplication operation × {\displaystyle \times } . The automaton consists of a finite set of states, a finite input alphabet of characters Σ {\displaystyle \Sigma } and edges which are labeled with both a character in Σ {\displaystyle \Sigma } and a weight in R {\displaystyle R} . The weight of any path in the automaton is defined to be the product of weights along the path, and the weight of a string is the sum of the weights of all paths which are labeled with that string. The weighted automaton thus defines a function from Σ ∗ {\displaystyle \Sigma ^{}} to R {\displaystyle R} . Weighted automata generalize deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs), which correspond to weighted automata over the Boolean semiring, where addition is logical disjunction and multiplication is logical conjunction. In the DFA case, there is only one accepting path for any input string, so disjunction is not applied. When the weights are real numbers and the outgoing weights for each state add to one, weighted automata can be considered a probabilistic model and are also known as probabilistic automata. These machines define a probability distribution over all strings, and are related to other probabilistic models such as Markov decision processes and Markov chains. Weighted automata have applications in natural language processing where they are used to assign weights to words and sentences, as well as in image compression. They were first introduced by Marcel-Paul Schützenberger in his 1961 paper On the definition of a family of automata. Since their introduction, many extensions have been proposed, for example nested weighted automata, cost register automata, and weighted finite-state transducers. Researchers have studied weighted automata from the perspective of learning a machine from its input-output behavior (see computational learning theory) and studying decidability questions. == Definition == A commutative semiring (or rig) is a set R equipped with two distinguished elements 0 ≠ 1 {\displaystyle 0\neq 1} and addition and multiplication operations ⊕ {\displaystyle \oplus } and ⊗ {\displaystyle \otimes } such that ⊕ {\displaystyle \oplus } is commutative and associative with identity 0 {\displaystyle 0} , ⊗ {\displaystyle \otimes } is commutative and associative with identity 1 {\displaystyle 1} , ⊗ {\displaystyle \otimes } distributes over ⊕ {\displaystyle \oplus } , and 0 is an absorbing element for ⊗ {\displaystyle \otimes } . A weighted automaton over R {\displaystyle R} is a tuple A = ( Q , Σ , Δ , I , F ) {\displaystyle {\mathcal {A}}=(Q,\Sigma ,\Delta ,I,F)} where: Q {\displaystyle Q} is a finite set of states. Σ {\displaystyle \Sigma } is a finite alphabet. Δ ⊆ Q × Σ × R × Q {\displaystyle \Delta \subseteq Q\times \Sigma \times R\times Q} is a finite set of transitions ( q , σ , w , q ′ ) {\displaystyle (q,\sigma ,w,q')} , where σ {\displaystyle \sigma } is called a character and w {\displaystyle w} is called a weight. I : Q → R {\displaystyle I:Q\to R} is an initial weight function. F : Q → R {\displaystyle F:Q\to R} is a final weight function. A path on input w ∈ Σ ∗ {\displaystyle w\in \Sigma ^{}} is a finite path in the graph, where the concatenation of the character labels equals w {\displaystyle w} . The weight of the path q 0 , q 1 , … , q n {\displaystyle q_{0},q_{1},\ldots ,q_{n}} is the product ( ⊗ {\displaystyle \otimes } ) of the weights along the path, additionally multiplied by the initial and final weights I ( q 0 ) ⊗ F ( q n ) {\displaystyle I(q_{0})\otimes F(q_{n})} . The weight of the word w {\displaystyle w} is the sum ( ⊕ {\displaystyle \oplus } ) of the weights of all paths on input w {\displaystyle w} (or 0 if there are no accepting paths). In this way the machine defines a function [ [ A ] ] : Σ ∗ → R {\displaystyle [\![{\mathcal {A}}]\!]:\Sigma ^{}\to R} . == Ambiguity and determinism == Since Δ {\displaystyle \Delta } is a set of transitions, weighted automata allow multiple transitions (or paths) on a single input string. Therefore a weighted automaton can be considered analogous to a nondeterministic finite automaton (NFA). As is the case with NFAs, restrictions of weighted automata are considered that correspond to the concepts of deterministic finite automaton and unambiguous finite automaton (deterministic weighted automata and unambiguous weighted automata, respectively). First, a preliminary definition: the underlying NFA of A {\displaystyle {\mathcal {A}}} is an NFA formed by removing all transitions with weight 0 {\displaystyle 0} and then erasing all of the weights on the transitions Δ {\displaystyle \Delta } , so that the new transition set lies in Q × Σ × Q {\displaystyle Q\times \Sigma \times Q} . The initial states and final states are the set of states q {\displaystyle q} such that I ( q ) ≠ 0 {\displaystyle I(q)\neq 0} and F ( q ) ≠ 0 {\displaystyle F(q)\neq 0} , respectively. A weighted automaton is deterministic if the underlying NFA is deterministic and unambiguous if the underlying NFA is unambiguous. Every deterministic weighted automaton is unambiguous. In both the deterministic and unambiguous cases, there is always at most one accepting path, so the ⊕ {\displaystyle \oplus } operation is never applied and can be omitted from the definition. == Variations == The requirement that there is a zero element for ⊕ {\displaystyle \oplus } is sometimes omitted; in this case the machine defines a partial function from Σ ∗ {\displaystyle \Sigma ^{}} to R {\displaystyle R} rather than a total function. It is possible to extend the definition to allow epsilon transitions ( q , ϵ , w , q ′ ) {\displaystyle (q,\epsilon ,w,q')} , where ϵ {\displaystyle \epsilon } is the empty string. In this case, one must then require that there are no cycles of epsilon transitions. This does not increase the expressiveness of weighted automata. If epsilon transitions are allowed, the initial weights and final weights can be replaced by initial and final sets of states without loss of expressiveness. Some authors omit the initial and final weight functions I {\displaystyle I} and F {\displaystyle F} . Instead, I {\displaystyle I} and F {\displaystyle F} are replaced by a set of initial and final states. If epsilon transitions are not present, this technically decreases expressiveness as it forces [ [ A ] ] ( ε ) {\displaystyle [\![{\mathcal {A}}]\!](\varepsilon )} to depend only on the number of states that are both initial and final. The transition function can be given as a matrix Δ σ ∈ R Q × Q {\displaystyle \Delta _{\sigma }\in R^{Q\times Q}} with entries in R {\displaystyle R} for each σ {\displaystyle \sigma } , rather than a set of transitions. The entry of the matrix at ( q , q ′ ) {\displaystyle (q,q')} is the sum of all transitions labeled ( q , σ , q ′ ) {\displaystyle (q,\sigma ,q')} . Some authors restrict to specific semirings, such as N {\displaystyle \mathbb {N} } or Z {\displaystyle \mathbb {Z} } , particularly when studying decidability results.

Self-management (computer science)

Self-management is the process by which computer systems manage their own operation without human intervention. Self-management technologies are expected to pervade the next generation of network management systems. The growing complexity of modern networked computer systems is a limiting factor in their expansion. The increasing heterogeneity of corporate computer systems, the inclusion of mobile computing devices, and the combination of different networking technologies like WLAN, cellular phone networks, and mobile ad hoc networks make the conventional, manual management difficult, time-consuming, and error-prone. More recently, self-management has been suggested as a solution to increasing complexity in cloud computing. An industrial initiative towards realizing self-management is the Autonomic Computing Initiative (ACI) started by IBM in 2001. The ACI defines the following four functional areas: Self-configuration Auto-configuration of components Self-healing Automatic discovery, and correction of faults; automatically applying all necessary actions to bring system back to normal operation Self-optimization Automatic monitoring and control of resources to ensure the optimal functioning with respect to the defined requirements Self-protection Proactive identification and protection from arbitrary attacks

Georgetown–IBM experiment

The Georgetown–IBM experiment was an influential demonstration of machine translation, which was performed on January 7, 1954. Developed jointly by Georgetown University and IBM, the experiment involved completely automatic translation of more than sixty Russian sentences into English. == Background == Conceived and performed primarily in order to attract governmental and public interest and funding by showing the possibilities of machine translation, it was by no means a fully featured system: It had only six grammar rules and 250 lexical items in its vocabulary (of stems and endings). Words in the vocabulary were in the fields of politics, law, mathematics, chemistry, metallurgy, communications and military affairs. Vocabulary was punched onto punch cards. This complete dictionary was never fully shown (only the extended one from Garvin's article). Apart from general topics, the system was specialized in the domain of organic chemistry. The translation was carried out using an IBM 701 mainframe computer (launched in April 1953). The Georgetown-IBM experiment is the best-known result of the MIT conference in June 1952 to which all active researchers in the machine translation field were invited. At the conference, Duncan Harkin from US Department of Defense suggested that his department would finance a new machine translation project. Jerome Weisner supported the idea and offered finance from the Research Laboratory of Electronics at MIT. Leon Dostert had been invited to the project for his previous experience with the automatic correction of translations (back then 'mechanical translation'); his interpretation system had a strong impact on the Nuremberg War Crimes Tribunal. The linguistics part of the demonstration was carried out for the most part by linguist Paul Garvin who had also good knowledge of Russian. Over 60 Romanized Russian statements from a wide range of political, legal, mathematical, and scientific topics were entered into the machine by a computer operator who knew no Russian, and the resulting English translations appeared on a printer. The sentences to be translated were carefully selected. Many operations for the demonstration were fitted to specific words and sentences. In addition, there was no relational or sentence analysis which could recognize the sentence structure. The approach was mostly 'lexicographical' based on a dictionary where a specific word had a connection with specific rules and steps. == Algorithm == The algorithm first translates Russian words into numerical codes, then performs the following case-analysis on each numerical code to choose between possible English word translations, reorder the English words, or omit some English words. The flowchart of the algorithm is reproduced in (see Table 1 for the 6 rules). == Translation examples == How it analyzes Vyelyichyina ugla opryedyelyayetsya otnoshyenyiyem dlyini dugi k radyiusu (figure 2 of ). == Reception == Well publicized by journalists and perceived as a success, the experiment did encourage governments to invest in computational linguistics. The authors claimed that within three or five years, machine translation could well be a solved problem. However, the real progress was much slower, and after the ALPAC report in 1966, which found that the ten years of long research had failed to fulfill the expectations, funding was reduced dramatically. The demonstration was given widespread coverage in the foreign press, but only a small fraction of journalists drew attention to previous machine translation attempts.