Using Gaia DR2 data to determine the distances of young star clusters & their distribution in the galactic plane

Figure 1. The very young star cluster NGC 654 in Cassiopeia. (Courtesy DSS2/Aladin Sky Atlas)
Figure 1. The very young star cluster NGC 654 in Cassiopeia. (Courtesy DSS2/Aladin Sky Atlas)

Young star clusters

Young star clusters are of particular interest because star formation has taken place relatively recently there. They are all within a few degrees of the galactic plane and, being young, haven’t had enough time to drift far from the site of their formation. It is believed that most of them are in the spiral arms of the galaxy and so their locations provide evidence of the shape and positions of these arms.

Until now, it has been difficult to determine the distances of young clusters. It is possible to use parallax measurements to determine their distances directly up to about one kiloparsec (kpc), but beyond that distance, parallaxes from Earth-based observatories are very uncertain.

Indirect methods have been used to determine the distances of clusters. Photometric measurements of the brightness and colours of the cluster members enable colour magnitude diagrams to be prepared, as well as measurements of the reddening and extinction of starlight as it passes through interstellar dust on its way to us.

Spectroscopic observations of brighter cluster members provide a means of estimating the luminosity of a star based on its spectral type and the profiles of the spectral lines. Massive stars have lower surface gravity than dwarfs and the spectral lines are narrower as a result. The mass-luminosity relationship allows the absolute magnitude of a star to be estimated and this, in combination with the degree of reddening/extinction and apparent magnitude make it possible to estimate the distance of the star. This is an uncertain process because extinction is caused in part by unevenly distributed dust in the galactic plane. Vallée (2017) draws attention to the fact that there is a high concentration of dust on the inner edges of spiral arms, which can be seen clearly on most face-on images of spiral galaxies.1 As well as this unevenness, there is often dust which is local to the cluster following recent star formation there.2 For example, the young cluster NGC 654, shown in Figure 1, has had values of between 1,250 and 2,900pc recorded for its distance.3,4

The Gaia space observatory gives us the opportunity to determine distances of stars directly using parallax measurements. In this paper fifty clusters have had their distances computed from Gaia data, allowing us to see, with some confidence, how they are distributed in the plane of the Milky Way, up to distances of around 5kpc.

The Gaia space observatory

Gaia was launched at the end of 2013 and was placed at the L2 Lagrangian point, about 1.5 million kilometres from Earth. DR2, the latest data release, contains data on over 1.3 billion stars, down to magnitude 20.5.5,6

DR2 is publicly available and retrieving subsets of the data is possible so that amateur astronomers, as well as professional ones, can undertake studies of position, parallax, proper motion and photometry of the objects in its database.

DR2 has a number of known issues, and workarounds are specified on the Gaia Archive website, giving ways to mitigate them for the data being analysed. More data releases are planned for the future and these will undoubtedly correct these issues. DR2 can be viewed as a table of over a billion rows and a hundred columns – ‘big data’ – which requires the use of a number of distinct operations to home in on the rows that represent the stars of interest.

Aims of the project

The first aim of the project is to develop an empirical, objective method of determining the distances of young star clusters. This is of interest because Gaia allows parallaxes of stars to be measured at distances of 10kpc: a tenfold increase on what is possible using telescopes based on Earth. DR2 provides an unprecedented opportunity to determine more accurate distances for these clusters.

The second aim is to investigate the distribution of young clusters in the plane of the Milky Way.

The ‘cut’

In this study a sample of 75 clusters aged less than 40M years is identified. The sample comprises NGC and IC clusters in that age range, drawn from the very extensive DAML02 catalogue.7 There are many more clusters in the catalogue, but it was necessary to limit their numbers to allow the interactive processing of the data using spreadsheets to be a manageable task. As it stands, almost 200 separate spreadsheets and several hundred hours of interaction were required to complete the analysis of these clusters.

Other databases consulted during this project include WEBDA (, VizieR ( and SIMBAD (bit. ly/3svlHtm).8,9,10

These databases have been compiled from several catalogues and, where distances are quoted, only SIMBAD gives references to published sources. For the purposes of comparison, the distances found in this study are contrasted with those in VizieR and, in the few cases where VizieR does not list a distance, the WEBDA value is used. SIMBAD lacks distance measurements for many of the clusters in it. Where it does contain distances, these are often from more than one source and the values quoted can vary enormously; for example the three published distances for NGC 7419 are 2,000, 4,100 and 18,713pc.

In all cases, the same size of field is used to retrieve data. This is a circle of radius 5arcmin. Clusters do, of course, vary in angular size but it is hard to ascertain where the boundary lies. Stars with the same properties as members of the cluster are often found far outside the central part of the cluster that we distinguish with our eyes. For example NGC 654 has an apparent radius of about three arcminutes but members have been found out to a radius of 10 arcminutes.11 There may also be tidal tails that are not apparent in images of clusters.12

A consequence of using this fixed size field is that some far-flung members of bigger clusters will escape consideration, but as it is just the distances of clusters that are of interest in this project we can determine that from the parallaxes of those stars in the central region, where members are most plentiful. An associated, more serious problem is that if the field is very large compared to the extent of the cluster, the foreground and background stars may dominate and it becomes hard to distinguish the cluster members. In this study it has been found that for each of the 75 clusters the number of stars in the 5arcmin radius is typically between 40 and 800. Processing these, even with the aid of spreadsheets, has taken several hundreds of hours. A 5arcmin radius is a reasonable compromise. Another advantage of using a fixed size field is that there is one less potential source of uncertainty to be taken into account in the comparison of the results. Tailoring the radius of the field to the size of each cluster would, at this point, involve subjective judgement, contrary to the aims of this project.

Retrieving data from the Gaia Archive

The Gaia Archive is found at The user interface is shown in Figure 2.

Figure 2. The user interface to the Gaia Archive, which contains Data Release 2 (DR2).
Figure 2. The user interface to the Gaia Archive, which contains Data Release 2 (DR2).

(Login or click above to view the full illustrated article in PDF format)

The British Astronomical Association supports amateur astronomers around the UK and the rest of the world. Find out more about the BAA or join us.